Image generation method
By processing image information through a global transformation network and a portrait transformation network of an image processing model, and then using a fusion network for fusion, the problem of distortion in portrait regions during style transfer is solved, achieving high-quality style transfer effects.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE INTERNET CO LTD
- Filing Date
- 2024-12-24
- Publication Date
- 2026-06-19
AI Technical Summary
When the original image includes a human figure area, the image style conversion is prone to distortion of the human figure area, resulting in poor image quality after style conversion.
A global transformation network and a portrait transformation network using an image processing model are used to process global image information and portrait image information respectively. A fusion network is then used to fuse the global style image and the portrait style image to ensure overall style consistency.
It effectively improves the image quality after style conversion, ensuring that the portrait area is not distorted during the style conversion process and that the overall style is consistent.
Smart Images

Figure CN119850408B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of image processing technology, and in particular relates to an image generation method. Background Technology
[0002] Image style transfer refers to transferring the content or structure of one image to the style or form of another image, that is, combining the content of one image with the style of another image to generate an image with a new style.
[0003] In related technologies, after a user selects a conversion style from multiple preset styles, the original image is converted using an image style conversion model corresponding to that style to obtain an image with the converted style. However, when the original image includes a human figure area, the converted image is prone to distortion in that area, resulting in poor image quality and affecting its visual effect. Summary of the Invention
[0004] This application provides an image generation method, apparatus, device, medium, and product, which can solve the problem in related technologies that when the original image includes a human portrait area, the image quality after style conversion based on the original image is poor.
[0005] In a first aspect, embodiments of this application provide an image generation method, which includes:
[0006] Obtain the first image and image conversion requirement information; the first image includes a human portrait region image.
[0007] The global transformation network of the image processing model processes the style reference information and the global image information of the first image to obtain a global style image. The portrait transformation network of the image processing model processes the style reference information and the portrait image information of the portrait region image to obtain a portrait style image. The style reference information is determined based on the image transformation requirement information.
[0008] The global style image and the portrait style image are fused together using a fusion network of an image processing model to obtain a second image.
[0009] Secondly, embodiments of this application provide an image generation apparatus, which includes:
[0010] The first acquisition module is used to acquire a first image and image conversion requirement information. The first image includes a human portrait region image.
[0011] The first processing module is used to process the style reference information and the global image information of the first image through the global transformation network of the image processing model to obtain a global style image, and to process the style reference information and the portrait image information of the portrait region image through the portrait transformation network of the image processing model to obtain a portrait style image.
[0012] The fusion module is used to fuse global style images and portrait style images through the fusion network of the image processing model to obtain a second image.
[0013] Thirdly, embodiments of this application provide an electronic device, the device comprising: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements an image generation method as described in any of the first aspects.
[0014] Fourthly, embodiments of this application provide a computer-readable storage medium storing computer program instructions, which, when executed by a processor, implement the image generation method as described in any of the first aspects.
[0015] Fifthly, embodiments of this application provide a computer program product, which includes a computer program or instructions that, when executed by a processor, implement the image generation method as described in any of the first aspects.
[0016] The image generation method, apparatus, device, medium, and product of this application embodiment utilize a global transformation network of an image processing model to process a first image and style reference information determined based on image transformation requirement information, and utilizes a portrait transformation network of the image processing model to process the style reference information and portrait image information. This effectively ensures that the global transformation network processes global image information, and the portrait transformation network processes portrait image information, both towards the same style reference information, i.e., the same style target. Then, a second image obtained by fusing the global style image and the portrait style image using a fusion network combines the global style and the portrait style, achieving overall stylistic consistency. Thus, while controlling the overall image transformation style, the characteristics of the portrait region in the image are considered, effectively improving the image quality of the style-transformed image. Attached Figure Description
[0017] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1A flowchart illustrating some embodiments of the image generation method provided in this application is shown;
[0019] Figure 2 A flowchart illustrating a specific implementation of step 140 provided in some embodiments of this application is shown;
[0020] Figure 3 A flowchart illustrating a specific implementation of step 1401 provided in some embodiments of this application is shown;
[0021] Figure 4 A flowchart illustrating a specific implementation of step 1402 provided in some embodiments of this application is shown;
[0022] Figure 5 A flowchart illustrating a specific implementation of step 1403 provided in some embodiments of this application is shown;
[0023] Figure 6 The illustration shows a flowchart of a method for updating reference noise parameters based on user actions on rating options in an image generation method provided in some embodiments of this application;
[0024] Figure 7 The diagram illustrates a flowchart of a method for updating reference noise parameters based on user actions on download options in an image generation method provided in some embodiments of this application.
[0025] Figure 8 The diagram illustrates a flowchart of a method for updating reference noise parameters based on user actions on download options in an image generation method provided in some embodiments of this application.
[0026] Figure 9 The diagram shows a style conversion interface provided in some embodiments of this application;
[0027] Figure 10 A schematic diagram is shown of a method for adjusting the first model parameter of an image processing model in an image generation method provided in some embodiments of this application;
[0028] Figure 11 A schematic diagram is shown of a method for adjusting the second model parameters of an image processing model in an image generation method provided in some embodiments of this application;
[0029] Figure 12 A schematic diagram of the network architecture of the global transformation network and the portrait transformation network in the image generation method provided in some embodiments of this application is shown;
[0030] Figure 13 The diagram illustrates a specific implementation of step 130 provided in some embodiments of this application;
[0031] Figure 14 The diagram illustrates a method for determining the third and fourth weights in an image generation method provided by some embodiments of this application;
[0032] Figure 15 The present application provides schematic diagrams of the structure of an image generation apparatus according to some embodiments;
[0033] Figure 16 The diagram shows a schematic representation of the structure of an electronic device provided in some embodiments of this application. Detailed Implementation
[0034] The features and exemplary embodiments of various aspects of this application will be described in detail below. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only intended to explain this application and not to limit it. For those skilled in the art, this application can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this application by illustrating examples.
[0035] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.
[0036] It should be noted that the acquisition, storage, use, and processing of data in this application embodiment all comply with the relevant provisions of national laws and regulations.
[0037] It should be noted that in the embodiments of this application, certain software, components, models and other existing solutions in the industry may be mentioned. These should be regarded as exemplary and are only intended to illustrate the feasibility of implementing the technical solution of this application. However, it does not mean that the applicant has used or necessarily used the solution.
[0038] To address the problems in the aforementioned related technologies, embodiments of this application provide an image generation method, apparatus, device, storage medium, and program product. The following description, in conjunction with the appendix... Figure 1 To be continued Figure 14 The image generation method provided in this application will be described in detail through specific embodiments and application scenarios.
[0039] Figure 1 A flowchart illustrating some embodiments of the image generation method provided in this application is shown. For example... Figure 1 As shown, the image generation method may include steps 110 to 130.
[0040] Step 110: Obtain the first image and image conversion requirement information. The first image includes a human portrait region image.
[0041] Step 120: The style reference information and the global image information of the first image are processed through the global transformation network of the image processing model to obtain a global style image. The style reference information and the portrait image information of the portrait region image are processed through the portrait transformation network of the image processing model to obtain a portrait style image. The style reference information is determined based on the image transformation requirement information.
[0042] Step 130: The global style image and the portrait style image are fused through the fusion network of the image processing model to obtain the second image.
[0043] Therefore, by utilizing a global transformation network of the image processing model to process the first image and the style reference information determined based on the image transformation requirements, and by utilizing a portrait transformation network of the image processing model to process the style reference information and the portrait image information, it is effectively ensured that the global transformation network processes global image information and the portrait transformation network processes portrait image information in the same direction, i.e., the same style target. Then, by using a fusion network to fuse the global style image and the portrait style image, the second image obtained combines the global style and the portrait style, and maintains consistency in the overall style. In this way, while controlling the overall style transformation of the image, the characteristics of the portrait region in the image are considered, effectively improving the image quality of the style-transformed image.
[0044] The steps described above are explained in detail below.
[0045] First, regarding step 110, the first image involved in this embodiment refers to the original image to be styled during the image generation process. The portrait region image is a specific part of the first image that includes a person's image, such as the face, full body, or part of the body of one or more people. The image conversion requirement information provides guidance and constraints for style conversion operations on the first image, and can be set by the user according to their aesthetic preferences, application scenario requirements, etc.
[0046] There are several ways to acquire the first image. For example, the image to be style-transformed can be selected from a user file or uploaded. Specifically, the user can select the image to be style-transformed from a pre-stored list of images or uploaded images. Then, the selected image undergoes preprocessing, such as cropping a specified area and adjusting its colors. The preprocessed image is then designated as the first image. Alternatively, the first image can be downloaded from online resources. Another example is acquiring the first image directly in real-time from an image acquisition device such as a digital camera or webcam.
[0047] In some embodiments of this application, in order to improve the accuracy of style reference information, the image generation method may further include step 140 before step 120.
[0048] Step 140: The first image and image transformation requirement information are processed through the image processing network of the image processing model to obtain style reference information, global image information of the first image and portrait image information of the portrait region image. The style reference information is information related to the expected transformation style for the first image.
[0049] The image processing model is a comprehensive architecture that integrates image processing networks, global transformation networks, portrait transformation networks, and fusion networks, aiming to achieve style transfer on the input image, i.e., the first image. Through the collaborative work of these networks, the image processing model processes different layers of information from the first image, such as global image information and local portrait image information, separately.
[0050] The image processing network transforms the original first image and abstract image transformation requirements into specific intermediate information that can be processed by subsequent networks, namely style reference information, separated global image information and portrait image information. Among them, style reference information provides a unified and accurate style guide for the global transformation network and the portrait transformation network, ensuring that the global transformation network can process global image information and the portrait transformation network can process portrait image information in the same style.
[0051] In some embodiments of this application, the image conversion requirement information may include style requirement information, which specifies the visual style-related information that the first image should present after style conversion. Global image information includes a global sampled image and a global noise image corresponding to the first image, and portrait image information includes a portrait sampled image and a portrait noise image corresponding to the portrait region image. Based on this, as... Figure 2 As shown, step 140 above may specifically include steps 1401 to 1404.
[0052] Step 1401: Input the first image and image transformation requirement information into the image processing network. Process the style requirement information through the image processing network to obtain style reference information; and perform sampling processing on the first image to obtain a global sampled image.
[0053] The style requirement information may include a style conversion type or a style reference image. The style conversion type indicates the desired style type after style conversion processing of the first image. The style reference image provides a visual reference example for style conversion of the first image, guiding it towards a style direction similar to or the same as the style reference image, so that the final generated second image has a style similar to the style reference image. It is understood that the style of the first image and the style corresponding to the style requirement information are at least two different styles. The style reference information is a vectorized representation of the style corresponding to the style requirement information.
[0054] In one example, style transfer types include, but are not limited to, ancient style, anime, aesthetic, fresh and simple, cartoon, sketch, ink painting, oil painting, quick sketch, printmaking, watercolor, etc. To obtain the style transfer type, a list of style transfer types can be displayed on the electronic device. This list includes at least two style transfer types, and the user can select one according to their needs. The electronic device will then determine the selected style transfer type as the style transfer type corresponding to the first image.
[0055] In another example, the style reference image can be a real image obtained by an image acquisition device, an image captured from a video, or an image searched by the user. The method for obtaining the style reference image is the same as that for the first image described above, and will not be repeated here.
[0056] For the first image I I The global sampled image O is obtained by sampling processing. I The sampling process can be represented as follows (1):
[0057]
[0058] Among them, w ch is the scaling factor for the width. c The scaling factor is i, j is the global sampled image O. I The coordinates of the middle pixel.
[0059] Step 1402: Extract the portrait sampling image from the global sampling image.
[0060] For example, a human face detection algorithm can be used to detect human faces in the globally sampled image. This algorithm can employ deep learning-based object detection algorithms such as You Only Look Once (YOLO) or Faster Region-based Convolutional Neural Networks (Faster R-CNN). These algorithms can be trained on large amounts of labeled human face data and are capable of identifying human face regions in the globally sampled image. Once a human face region is detected, the corresponding human face sample image is cropped from the globally sampled image based on the bounding box coordinates of that region.
[0061] Step 1403: Using style reference information, noise filling is applied to the global sampled image to obtain a global noise image.
[0062] Noise filling is the process of adding random or specific patterned noise data to the globally sampled image.
[0063] For example, the noise can be Gaussian noise where the pixel values follow a Gaussian distribution, or salt-and-pepper noise from randomly appearing black and white pixels. Here, the global noise image with added noise increases the randomness and richness of the first image, which helps to simulate the texture and feel of different styles during style transfer.
[0064] Step 1404: Extract the human portrait noise image from the global noise image.
[0065] For example, the corresponding region can be cropped from the global noise image based on the bounding box coordinates of the portrait region obtained in step 1402, and used as the portrait noise image.
[0066] Therefore, the style reference information output by the image processing network provides clear guidance for the style transfer of the first image. The global sampled image obtained through sampling processing effectively reduces the amount of data processing while preserving the key features of the first image, thus improving the processing efficiency of the image processing model. Next, a portrait sampled image is extracted from the global sampled image, achieving separation between the portrait region and the global region. This avoids deformation or distortion of the portrait region during unified processing, greatly ensuring the visual quality of the portrait region after style transfer. Then, a global noise image is obtained by noise filling the global sampled image based on the style reference information, which enhances the visual effect of the image. Finally, a portrait noise image is extracted from the global noise image, ensuring that the portrait region and the global region maintain consistency and coordination at the noise level, further improving the overall integrity and harmony of the image after style transfer.
[0067] In some embodiments of this application, when the style requirement information includes a style conversion type, the process of processing the style requirement information in step 1401 to obtain style reference information may include determining the first style feature vector associated with the style conversion type as style reference information based on the association information between the preset style conversion type and the preset style feature vector.
[0068] For example, a pre-trained style set including multiple different style transfer types can be constructed. The acquisition of the pre-trained style set can include, for each style transfer type, obtaining multiple image samples with typical features corresponding to that style transfer type, and inputting these image samples into a convolutional neural network. The convolutional neural network performs multi-layer convolution and pooling operations on these image samples, progressively extracting features from low to high levels and converting them into corresponding style feature vectors. Then, based on the user's selection, a search and matching operation is performed within the constructed pre-trained style set; that is, after the user selects a style transfer type, the style feature vector associated with that style transfer type is retrieved from the pre-trained style set.
[0069] Therefore, by using the correlation information between style transfer type and style feature vector, the style feature vector associated with the style transfer type can be obtained accurately and quickly, providing a style reference for the image style transfer of the first image.
[0070] In other embodiments of this application, when the style requirement information includes a style reference image, such as Figure 3 As shown, the process of processing the style requirement information in step 1401 to obtain style reference information may include steps 14011 to 14013.
[0071] Step 14011: Compress the style reference image according to the preset size to obtain a style compressed image.
[0072] The preset size is a specific image size standard determined by the structure and processing capabilities of the global transformation network, portrait transformation network, and fusion network. By compressing the style reference image, it effectively avoids processing errors or inefficiencies caused by size mismatch.
[0073] For example, the preset size is 244×244. If the width and height of the style image S are W... s H s The coordinates (x, y) of the style image S can be extracted using the following formula (2):
[0074]
[0075] Where x,y∈N * x, y ≤ 244
[0076] Step 14012: Vectorize the style-compressed image to obtain the second style feature vector.
[0077] For example, convolutional neural networks can be used to perform multi-layer convolution and pooling operations on style-compressed images to gradually extract features from low to high levels of the style-compressed images and convert them into corresponding second style feature vectors.
[0078] Step 14013: Determine the second style feature vector as style reference information.
[0079] Therefore, when processing style reference images, compression ensures that the image size matches the processing capabilities of the image processing model, improving the model's efficiency. Vectorization, on the other hand, represents the style features of the style reference image in a unified form that is easy for the image processing model to process, enabling subsequent global transformation networks and portrait transformation networks to accurately perform style transformation on the first image based on these feature information.
[0080] In some embodiments of this application, such as Figure 4 As shown, step 1402 may specifically include steps 14021 to 14023.
[0081] Step 14021: Perform grayscale conversion on the global sampled image to obtain a global grayscale image.
[0082] For example, the original RGB values of each pixel P(x, y) in the globally sampled image are R(x, y ...
[0083] Given P(x, y), G(x, y), and B(x, y), the gray value Gray(x, y) of P(x, y) can be represented by the following formula (3):
[0084] Gray(x,y)=0.299R(x,y)+0.587G(x,y)+0.114B(x,y) (3)
[0085] In this way, by using the above formula (3), all pixels of the entire global sampled image are converted into grayscale values, and a global grayscale image is obtained.
[0086] Step 14022: Based on the preset human image feature information, perform human image detection on the global grayscale image to determine the human image region.
[0087] Here, the portrait feature information is a pre-defined set of features related to a human image. Specifically, it may include the shape, proportion, relative positional relationship of facial organs, and features of the human body contour. Based on this portrait feature information, a portrait detection model or algorithm can be constructed using the target detection algorithm in step 1402 above, in order to identify portrait regions in an image.
[0088] For example, by inputting a global grayscale image into a human face detection model, the human face detection result P'={(x′1,h′1,w′1,h′1),(x′2,h′2,w′2,h′2),...,(x′ n ,h′ n ,w′ n ,h′ n )}, where x′ i ,h′ i w′ represents the starting coordinates of the i-th portrait. i ,h′ i represents the width and height of the i-th portrait, and n represents the number of portraits detected.
[0089] Step 14023: Extract the portrait sampling image from the global sampling image according to the portrait region image.
[0090] For example, based on the starting coordinates and width and height of the human face in the above human face detection results, the human face region can be determined. Then, the human face region is extracted from the global sampled image based on the human face region to obtain the human face sampled image.
[0091] Therefore, by converting the global sampled image to grayscale to obtain a global grayscale image, the amount of data processed by the image can be reduced, and the efficiency and accuracy of human image region detection can be improved. Then, the human image sampled image is separated from the global sampled image, so that the subsequent image style conversion process can perform fine processing on the human image region to better preserve the natural features of the human image.
[0092] In some embodiments of this application, such as Figure 5 As shown, step 1403 can specifically include steps 14031 and 14032.
[0093] Step 14031: Generate a blank image corresponding to the global sampled image according to the size information and channel information of the global sampled image.
[0094] Here, size information refers to the number of pixels contained in the global sampled image in the horizontal and vertical directions. Channel information refers to the color mode and channel structure of the global sampled image.
[0095] Step 14032: Based on the reference noise parameters, adjust the initial pixel values of the blank image to obtain a global noise image. The reference noise parameters are determined based on user feedback information.
[0096] Among them, the reference noise parameters are a set of parameters used to control the noise generation and addition process, which can affect the characteristics of the noise, such as the noise intensity and distribution pattern.
[0097] For example, after obtaining the blank image, the initial pixel value of all pixels in the blank image is set to 0. Then, the pixel values of all pixels in the blank image P are traversed, and the pixel values of each pixel are updated using the following formula (4):
[0098]
[0099] Where rand(k) represents a random number generation function that follows a normal distribution with expectation 1 and variance k, generating values x∈R (0≤x≤1), and i, j represent the positions of points (i, j) in the blank image P. amb The user noise generalization factor has the value of: P amb ∈N(0≤P amb ≤100), where S l This represents the output obtained after the content image is propagated forward and the style image is propagated to layer l with si(l) = 1, and then propagated backward to the content image.
[0100] Here, the user-defined noise generalization factor can adjust the propagation range and intensity variation of noise in the image; different P values... amb The value will cause the noise to appear in different densities and sizes on the image. The k in the rand(k) function is related to the distribution characteristics of the generated random numbers, which in turn affects the randomness and fluctuation of the noise. By adjusting these reference noise parameters, a suitable global noise image can be generated according to the user's needs and the characteristics of the image to achieve the desired style transfer effect.
[0101] Therefore, the global noise image generated based on the structure of the global sampled image and the reference noise parameters can be well integrated with the image content, avoiding the abruptness of noise, making the style-transformed image more visually natural, improving the quality and effect of image style conversion. At the same time, the introduction of reference noise parameters meets the diverse and refined image style needs of different users.
[0102] In some embodiments of this application, the aforementioned user feedback information includes a first user feedback value. Based on this, after step 130, as follows: Figure 6 As shown, the image generation method may further include steps 210 to 230.
[0103] Step 210: Display the second image in the display area of the style conversion interface. The style conversion interface includes a rating option, which is used to prompt the user to quantify and score the conversion effect of the second image.
[0104] The style conversion interface is an interactive interface that provides users with the ability to perform image style conversion and view the conversion result (the second image). The rating option is an interactive component on the style conversion interface used to guide users in quantitatively evaluating the image conversion effect.
[0105] For example, the style conversion interface integrates various elements related to image style conversion. For instance, it can display the original image to be converted (the first image), the converted image (the second image), and functional options for user interaction and feedback, such as rating options. These rating options can be a set of radio buttons, each corresponding to a specific score, or a slider where the user selects a score by dragging the slider.
[0106] Step 220: In response to the first input to the rating option, obtain the first user feedback value corresponding to the rating option.
[0107] For example, suppose a user makes a first input, such as clicking, on a rating option. The first input is that the user clicked one or more stars to rate the option. When the user makes such an action, the system responds and obtains the first user feedback value corresponding to the rating option. For example, if the rating range is 1 to 5 stars, and the user clicked the third star to rate the option, then the first user feedback value could be 3.
[0108] Step 230: Update the reference noise parameters based on the first user feedback value.
[0109] For example, if the first user feedback value is low, it indicates that the user is not satisfied with the noise-related part of the current style conversion effect, perhaps feeling that the noise is too strong or too weak, or does not meet the expected style texture. In this case, the user-generated noise generalization factor and the parameters related to random number generation in the reference noise parameters can be adjusted.
[0110] Therefore, by dynamically updating the reference noise parameters based on the first user feedback value, adaptive optimization of the image style transfer effect is achieved. That is, the noise-related style transfer features can be continuously adjusted according to the user's evaluation, gradually generating style transfer images that better meet the user's expectations, thus improving the adaptability and effectiveness of the entire image generation method under different user demand scenarios.
[0111] In other embodiments of this application, the user feedback information includes a second user feedback value. Based on this, after step 130 above, as... Figure 7 As shown, the image generation method may further include steps 310 to 330.
[0112] Step 310: Display the second image in the display area of the style conversion interface. The style conversion interface includes a download option, which prompts the user to download the second image.
[0113] The download option is an interactive component on the style conversion interface that allows users to save the generated second image to local devices such as computers and mobile phones. It can be presented visually through buttons, links, or other means, along with corresponding text prompts such as "Download Image".
[0114] Step 320: In response to the second input on the download option, obtain the second user feedback value corresponding to the download option.
[0115] Once a user's action on the download option is detected, the second user feedback value associated with the download option will be acquired and recorded. The second user feedback value can be a pre-set numerical value that represents the user's level of approval for the second image.
[0116] For example, the second user feedback value can be set to 1 if downloading indicates satisfaction, and 0 if not downloading indicates dissatisfaction.
[0117] Step 330: Update the reference noise parameters based on the second user feedback value.
[0118] For example, if the second user feedback value indicates that the user is satisfied with the image, it means that the current style transfer effect meets the user's expectations to a certain extent. In this case, the reference noise parameters can be fine-tuned to maintain or further optimize the current effect. For example, the user noise generalization factor and parameters related to random number generation can be slightly increased to make the noise more natural. Conversely, if the second user feedback value indicates that the user is dissatisfied, the reference noise parameters need to be adjusted more significantly. For example, the user noise generalization factor and parameters related to random number generation may need to be significantly changed to alter the noise effect in the subsequently generated second image, so as to generate a style transfer image that better meets the user's preferences.
[0119] Therefore, by dynamically updating the reference noise parameters based on the second user feedback value, adaptive optimization of the image style transfer effect is achieved. That is, based on the satisfaction reflected by the user's actual download behavior, the noise-related style transfer features in the image are continuously adjusted to generate a converted image that meets the user's needs.
[0120] In some other embodiments of this application, after step 130 above, as follows: Figure 8 As shown, the image generation method may further include steps 410 to 430.
[0121] Step 410: If the style conversion interface includes a rating option and a download option, obtain the first user feedback value corresponding to the rating option and the second user feedback value corresponding to the download option.
[0122] For example, such as Figure 9 The style conversion interface shows a first image and three corresponding second images. These three second images are obtained by performing three style conversions on the first image. Below each second image is a "Download Original" button, which corresponds to that second image. Below each "Download Original" button is a row of rating options, allowing users to rate the corresponding second image. Users can rate the image conversion result (e.g., the second image) by clicking a star; the image generation system will use these ratings as the first user feedback value. Users can also click the "Download Original" button to download the second image; the image generation system will record this download action as a second user feedback value.
[0123] Step 420: Based on the first weight of the first user feedback value and the second weight of the second user feedback value, perform a weighted summation on the first user feedback value and the second user feedback value to obtain the target feedback value.
[0124] The first weight corresponds to the first user feedback value, representing the proportion of the first user feedback value in determining the direction and extent of the reference noise parameter update; the second weight corresponds to the second user feedback value, representing the proportion of the first user feedback value in determining the direction and extent of the reference noise parameter update. The values of the first weight and the second weight can be determined based on factors such as the actual application scenario, the analysis of the importance of different feedback channels, and past experience.
[0125] For example, for the same conversion request from a user for the first image, multiple conversions can be performed, and the resulting second images can be fed back to the user for selection and rating. If three conversions are performed, three corresponding converted images I′ will be generated. g ={I g1 ,I g2 ,I g3 The system provides feedback to the user for selection and rating. Users can select multiple images and download them to their local device. The image generation system records the downloaded images. d =
[0126] {(I g1 ,F g1 ),(I g2 ,F g2 ),(I g3 ,F g3 )}, where F gi The second user feedback value represents I. gi Whether it has been downloaded, when the image is downloaded, I gi =1, otherwise I gi =0, then the user can rate each second image, with the rating options ranging from M. p ∈N, 0≤M s ≤10, the score result is as follows:
[0127] M g ={(I g1 M g1 ),(I g2 M g2 ),(I g3 M g3 )}, where M gi Indicates user's opinion on I gi The first user feedback value. Finally, based on the first and second user feedback values, the target feedback value of the image is calculated: MC. g ={(I g1 MC g1 ),(I g2 MC g2 ),(I g3 MC g3)}, where MC gi Indicate I gi The target feedback value can be expressed by the following formula (5):
[0128] MC gi =F g3 *0.3+M g3 *0.7 (5)
[0129] Among them, 0.7 is the first weight and 0.3 is the second weight.
[0130] Step 430: Update the reference noise parameters based on the target feedback value.
[0131] For example, the user noise generalization factor can be updated on a daily basis. First, the user's image conversion records and corresponding user feedback information records R = {(I1,MC1,k1), (I2,MC2,k2), ...,} are obtained on a daily basis.
[0132] (I n MC n ,k n )}, where I n MC represents the nth second image. n Indicate I n The target feedback value, k n Indicate I n The sequence number in the entire transformation sequence can be updated using the following formula (6) to determine the user noise generalization factor P. amb :
[0133]
[0134] Among them, MC i -5 is used to measure how good or bad the i-th second image is relative to the intermediate feedback level 5.
[0135] Therefore, by integrating the first and second user feedback values through a weighted summation method, the relative importance of different feedback channels in reflecting user satisfaction is fully considered. This allows the final target feedback value to more objectively represent the user's comprehensive evaluation of the image style conversion effect of the second image. Compared to relying solely on feedback values from a single channel, this comprehensive approach avoids the problem of inaccurate judgment of the user's true feelings due to the limitations of a single feedback source, providing a reliable basis for more reasonably updating the reference noise parameters.
[0136] In some embodiments of this application, the aforementioned image conversion requirement information includes the degree of style conversion. The degree of style conversion refers to the extent to which the first image changes its original style and approaches the target style corresponding to the style reference information during style conversion. For example, when converting a realistic first image to an oil painting style, a low degree of style conversion results in the second image only slightly exhibiting some color characteristics of oil painting; while a high degree of style conversion results in the second image highly displaying typical characteristics of oil painting, such as rich colors and thick, distinct brushstrokes. Based on this, after step 110 above, such as Figure 10 As shown, the image generation method may include steps 510 to 520.
[0137] Step 510: Obtain the first model parameter associated with the style conversion degree based on the correlation information between the preset style conversion degree and the preset model parameters.
[0138] The preset model parameters are a series of adjustable parameter values involved in the construction and training of the image processing model. These parameters determine the specific methods and effects of the model in performing various image processing operations, such as feature extraction, style transfer, and noise addition. The first set of model parameters includes weight parameters and adjustment factors related to the degree of style transfer. The correlation between the preset style transfer degree and the preset model parameters can be obtained by learning from a large amount of sample data with different degrees of style transfer through the image processing model and by summarizing human experience.
[0139] It is understandable that different combinations of model parameter values will result in the model outputting images with different style conversion effects.
[0140] For example, the style transfer level can be divided into ten levels from 1 to 10, or 100 levels from 1 to 100. For each level, the corresponding preset model parameter value is recorded. The image generation system pre-stores a table or mapping rules relating style transfer levels to preset model parameters. When the image generation system receives the style transfer level set by the user, such as when the user selects "Level 1" via the slider on the image conversion interface, the image generation system will perform a search and matching operation based on the existing association information. Specifically, if the association information is stored in the form of an association table, when obtaining the first model parameter, the preset model parameter value associated with the corresponding style transfer level "Level 1" is directly searched in the association table.
[0141] Step 520: Adjust the model parameters related to the degree of style transfer in the image processing model according to the first model parameters.
[0142] Therefore, by adjusting the model parameters of the image processing model according to the user-specified degree of style transfer, the image processing model can process the first image according to the desired degree of style transfer, thereby generating a second image whose style transfer effect accurately matches the user's needs and improving the image quality of the second image.
[0143] In some other embodiments of this application, the aforementioned image conversion requirement information includes the degree of style abstraction, which refers to the degree to which the style of the second image deviates from representational realism and tends towards abstract expression. For example, a representational image can clearly distinguish the specific shape and details of objects, and has a low degree of abstraction; while an abstract image emphasizes the combination and meaning of elements such as color, shape, and lines, weakening the presentation of specific objects, and has a high degree of abstraction. Based on this, after step 110 above, such as Figure 11 As shown, the image generation method may include steps 610 to 620.
[0144] Step 610: Obtain the second model parameter associated with the style abstraction level based on the correlation information between the preset style abstraction level and the preset model parameters.
[0145] The second model parameters include weight parameters and adjustment factors related to the level of style abstraction. The correlation between the preset level of style abstraction and the preset model parameters can be obtained by learning from a large amount of sample data of styles with different levels of abstraction through image processing models and by summarizing human experience.
[0146] It is understandable that different combinations of model parameter values will result in the model outputting images with different styles and levels of abstraction.
[0147] For example, the level of style abstraction can be divided into ten levels from 1 to 10, or 100 levels from 1 to 100. For each level of style abstraction, it is associated with a preset model parameter combination that matches that level of style abstraction. When the image generation system receives the style abstraction level set by the user, the image generation system performs a search and retrieval operation according to the existing association information.
[0148] Step 620: Adjust the model parameters related to the level of style abstraction in the image processing model according to the second model parameters.
[0149] Therefore, by adjusting the model parameters of the image processing model according to the user-defined level of style abstraction, the image processing model can accurately generate images of the corresponding style based on the user-specified level of style abstraction. This improves the flexibility and effectiveness of the image generation method in dealing with different levels of style abstraction, thereby improving the image quality of the second image.
[0150] Next, in step 120, both the global transformation network and the portrait transformation network employ the following... Figure 12 The diagram shows a deep convolutional network structure built upon the Visual Geometry Group (VGG) with three fully connected layers removed.
[0151] Style reference information and global image information are fed into a VGG model with three fully connected layers removed for forward propagation, as are style reference information and portrait image information. The loss is then calculated based on the loss function. During this process, the link parameters, style reference information, global image information, and portrait image information of the VGG model with the three fully connected layers removed remain unchanged. Gradient descent is then used for backpropagation to the output image, and the output image is adjusted to complete the model training. In this VGG model with the three fully connected layers removed, only convolutional layers participate in the computation. Each convolutional layer contains all convolutional kernels and their corresponding feature maps, and includes pooling operations. Fully connected layers include regularization and residual connection processing, while the output layer undergoes normalization. During model training, the fully connected layers and softmax layers after the convolutional layers are not actually computed. In addition, after calculating the loss function according to the style transfer abstraction level, if the current and subsequent convolutional layers all have a loss of 0, they will not be actually propagated.
[0152] The loss functions mentioned above will be explained in detail below.
[0153] Image content conversion loss function L c (C,S,l) can be represented by the following formula (7):
[0154]
[0155] Where l is the layer number, n is the number of human faces detected by human face detection, and C l P represents the features extracted from the globally sampled image when propagated to layer l. l C represents the features extracted from the global noisy image at layer l. pi l P represents the features extracted from the i-th portrait sample image at layer l. pi l The feature extracted from the noise image of the i-th person at layer l represents the feature extracted from the i-th person's image at layer l.
[0156] Image style transfer loss function L s (C,S,l) can be represented by the following formula (8):
[0157]
[0158] Among them, l is the layer where it is located, n is the number of human figures detected through human figure detection, C p is the influence factor adjustment factor of the human figure model configured by the system, P l represents the features extracted from the global noise image at layer l, S l represents the features extracted from the style image at layer l, N l represents the number of features of the style image at layer l, M l represents the number of channels of the style image at layer l, S pi l represents the features extracted from the style image corresponding to the i-th human figure at layer l, P pi l represents the features extracted from the noise image of the i-th human figure at layer l.
[0159] The activation function si(l) of layer l can be expressed by the following formula (9):
[0160]
[0161] Among them, P a represents the style conversion abstraction level, and its value is: P a ∈N(0<x≤100), when si(l)=0, let L s (C, S, l)=0, and the actual calculation of the loss function of layer l is no longer executed.
[0162] The loss function of image generation can be expressed by the following formula (10):
[0163]
[0164] Then, referring to step 130, the fusion network is used to perform a fusion operation on the global style image and the human figure style image to generate an output image with better perspective effect that combines style features and has a coordinated human figure area, that is, the second image.
[0165] In some embodiments of the present application, as Figure 13 shown, the above step 130 may specifically include steps 1301 to 1303.
[0166] Step 1301, obtain the human figure area corresponding to the human figure area image in the global style image.
[0167] Exemplarily, when processing the first image, the boundary box coordinate information of the human figure area in the first image is determined through the human figure detection algorithm, the human figure area is determined, and then according to these boundary box coordinate information, the corresponding area is intercepted or located from the global style image, that is, the human figure area.
[0168] Step 1302: For each pixel in the portrait region, determine the target pixel value based on the first pixel value in the portrait style image and the second pixel value in the global style image.
[0169] In one example, the target pixel value can be obtained by assigning different weights to the pixel values of the portrait-style image and the global-style image, and then calculating their weighted average.
[0170] In another example, linear interpolation can be used to find a new value between the first and second pixel values as the target pixel value.
[0171] Step 1303: Map the target pixel value of each pixel to the corresponding position in the portrait area to generate the second image.
[0172] For example, after calculating the target pixel values for all pixels in the portrait region, the target pixel values are assigned back to the corresponding pixel positions in the global style image one by one, based on the original position information of each pixel in the portrait region. For instance, if a pixel has coordinates (i, j) in the portrait region, the calculated target pixel value is assigned to the pixel with coordinates (i, j) in the global style image.
[0173] Therefore, by calculating the target pixel value based on the first pixel value and the second pixel value, and then generating the final fused image, the process ensures that the fused portrait area can be accurately embedded into the global style image. This allows the generated second image to retain the overall style characteristics of the global style image while exhibiting a unique fusion style in the portrait area, thus improving the image quality of the second image.
[0174] In some embodiments of this application, step 1302 may specifically include:
[0175] For each pixel in the portrait region, the first pixel value and the second pixel value are weighted and summed according to the third weight of the first pixel value and the fourth weight of the second pixel value to obtain the target pixel value of the pixel.
[0176] For example, the target pixel value O' of the second image at (x+j, y+k) c (x+j,y+k) can be expressed by the following formula (11):
[0177] O' c (x+j,y+k)=O' c (x+j,y+k)(1-p)+O pi (j,k)p (11)
[0178] Among them, x represents the starting position coordinate in the horizontal direction of the portrait style image in the style conversion network, y represents the starting position coordinate in the vertical direction of the portrait style image in the style conversion network, j represents the position coordinate in the horizontal direction of the portrait style image relative to the starting position coordinate, k represents the position coordinate in the vertical direction of the portrait style image relative to the starting position coordinate, and p represents the third weight of the first pixel value at the (j, k) position of the portrait style image.
[0179] Here, p represents the fusion ratio coefficient, which is used to control the fusion degree between the output result of the portrait conversion network and the output result of the style conversion network. When p = 1, it means that the result of the portrait conversion network is completely used; when p = 0, it means that the result of the style conversion network is completely used; when 0 < p < 1, it means that the two are fused in a certain proportion.
[0180] Thus, the effective fusion of the global style image and the portrait style image is achieved, and the second image with high quality and coordinated style is generated.
[0181] In some embodiments of the present application, before the above step 1302, as Figure 14 shown, the above image generation method may further include steps 710 to step 730.
[0182] Step 710, for each pixel point in the portrait area, determine the relative distance between the pixel point and the center point according to the position information of the pixel point and the position information of the center point of the portrait area.
[0183] Step 720, according to the associated information of the preset relative distance and the preset weight, determine the weight corresponding to the relative distance as the third weight.
[0184] Exemplarily, the associated information of the preset relative distance and the preset weight can be implemented by a weight mapping function. The weight mapping function can be represented by the following formula (12):
[0185]
[0186] Among them, j represents the position coordinate in the horizontal direction of the portrait style image relative to the starting position coordinate, k represents the position coordinate in the vertical direction of the portrait style image relative to the starting position coordinate, w represents the width of the portrait style image in the horizontal direction, and h represents the height of the portrait style image in the vertical direction.
[0187] Here, the size of the weight value p depends on the position of the point (j, k) relative to the center point. When the point (j, k) is close to the center point, the value of p is larger; when the point (j, k) is far from the reference point, the value of p is smaller.
[0188] Step 730, determine the weight corresponding to the third weight as the fourth weight.
[0189] For example, the sum of the third weight and the fourth weight is 1. After determining the third weight p, the fourth weight is determined to be 1-p. For example, if the third weight is 0.6, then the fourth weight is 0.4.
[0190] Therefore, the relative distance between a pixel and the center point provides a basis for the weight allocation of pixels at different locations within the portrait area during fusion. Specifically, pixels closer to the center point may be more influenced by the portrait style image, while pixels farther from the center point are more likely to be fused with the global style image. By determining the third weight based on relative distance, the influence of the portrait style image in fusion is dynamically adjusted according to the position of the pixel within the portrait area. This results in a natural transition in style from the center to the edge of the portrait area in the fused second image, avoiding abrupt transitions between the portrait style image and the global style image at the boundary, and improving the visual quality and naturalness of the fusion of the portrait style image and the global image.
[0191] Based on the image generation method provided in the above embodiments, this application also provides specific implementations of the image generation apparatus. Please refer to the following embodiments.
[0192] See Figure 15 The image generation apparatus 800 provided in this application embodiment includes:
[0193] The first acquisition module 810 is used to acquire a first image and image conversion requirement information, wherein the first image includes a human portrait region image;
[0194] The first processing module 820 is used to process the style reference information and the global image information of the first image through the global transformation network of the image processing model to obtain a global style image, and to process the style reference information and the portrait image information of the portrait region image through the portrait transformation network of the image processing model to obtain a portrait style image.
[0195] The fusion module 830 is used to fuse a global style image and a portrait style image through a fusion network of an image processing model to obtain a second image.
[0196] Therefore, by utilizing a global transformation network based on an image processing model to process the first image acquired by the first acquisition module 810 and the style reference information determined based on image transformation requirements, and by utilizing a portrait transformation network based on the image processing model to process the style reference information and portrait image information, it is effectively ensured that the global transformation network of the first processing module 820 processes global image information, and the portrait transformation network processes portrait image information, both towards the same style reference information, i.e., the same style target. Then, the second image obtained by fusing the global style image and the portrait style image using the fusion network of the fusion module 830 combines the global style and the portrait style, achieving overall stylistic consistency. In this way, while controlling the overall style transformation of the image, the characteristics of the portrait region in the image are considered, effectively improving the image quality of the style-transformed image.
[0197] In some embodiments of this application, the image generation apparatus 800 described above may further include:
[0198] The second processing module is used to process the first image and image transformation requirement information through the image processing network of the image processing model to obtain style reference information, global image information of the first image and portrait image information of the portrait region image. The style reference information is information related to the expected transformation style of the first image.
[0199] In some embodiments of this application, the second processing module described above can specifically be used for:
[0200] When the image conversion requirement information includes style requirement information, the global image information includes a globally sampled image and a globally noisy image, and the portrait image information includes a portrait sampled image and a portrait noisy image, the first image and the image conversion requirement information are input into the image processing network. The image processing network processes the style requirement information to obtain style reference information and performs sampling processing on the first image to obtain a globally sampled image.
[0201] Extract portrait sample images from the global sampled image;
[0202] By using style reference information, noise filling is applied to the global sampled image to obtain a global noise image;
[0203] Extract human portrait noise image from global noise image.
[0204] In some embodiments of this application, the second processing module described above can specifically be used for:
[0205] When the style requirement information includes the style conversion type, the first style feature vector associated with the style conversion type is determined as the style reference information based on the association information between the preset style conversion type and the preset style feature vector.
[0206] In some embodiments of this application, the second processing module described above can specifically be used for:
[0207] If the style requirement information includes a style reference image, the style reference image is compressed according to a preset size to obtain a style compressed image.
[0208] The style-compressed image is vectorized to obtain the second style feature vector;
[0209] The second style feature vector is determined as style reference information.
[0210] In some embodiments of this application, the second processing module described above can specifically be used for:
[0211] Perform grayscale conversion on the global sampled image to obtain a global grayscale image;
[0212] Based on preset human portrait feature information, human portrait detection is performed on the global grayscale image to determine the human portrait region image;
[0213] Based on the human portrait region image, extract the human portrait sampling image from the global sampling image.
[0214] In some embodiments of this application, the second processing module described above can specifically be used for:
[0215] Generate a blank image corresponding to the global sampled image based on the size and channel information of the global sampled image;
[0216] Based on reference noise parameters, the initial pixel values of the blank image are adjusted to obtain a global noise image. The reference noise parameters are determined based on user feedback.
[0217] In some embodiments of this application, the fusion module 830 described above can be specifically used for:
[0218] After the user feedback information includes the first user feedback value, and the global style image and the portrait style image are fused through the fusion network of the image processing model to obtain the second image, the second image is displayed in the display area of the style conversion interface. The style conversion interface includes a scoring option, which is used to prompt the user to quantify and score the conversion effect of the second image.
[0219] In response to the first input to the rating option, obtain the first user feedback value corresponding to the rating option;
[0220] Update the reference noise parameters based on the first user feedback value.
[0221] In some embodiments of this application, the image generation apparatus 800 further includes:
[0222] The first display module is used to display the second image in the display area of the style conversion interface after the user feedback information includes the second user feedback value and the global style image and the portrait style image are fused through the fusion network of the image processing model to obtain the second image. The style conversion interface includes a download option to prompt the user to download the second image.
[0223] The second acquisition module is used to acquire the second user feedback value corresponding to the download option in response to the second input of the download option;
[0224] The first update module is used to update the reference noise parameters based on the second user feedback value.
[0225] In some embodiments of this application, the image generation apparatus 800 further includes:
[0226] The third acquisition module is used to obtain the first user feedback value corresponding to the rating option and the second user feedback value corresponding to the download option after fusing the global style image and the portrait style image through the fusion network of the image processing model to obtain the second image, and when the style conversion interface includes a rating option and a download option.
[0227] The first determining module is used to perform a weighted summation of the first user feedback value and the second user feedback value according to the first weight of the first user feedback value and the second weight of the second user feedback value to obtain the target feedback value.
[0228] The second update module is used to update the reference noise parameters based on the target feedback value.
[0229] In some embodiments of this application, the image generation apparatus 800 further includes:
[0230] The fourth acquisition module is used to acquire, after obtaining the image conversion requirement information including the degree of style conversion and the first image and the image conversion requirement information, the first model parameter associated with the degree of style conversion based on the association information between the preset degree of style conversion and the preset model parameter;
[0231] The first adjustment module is used to adjust the model parameters related to the degree of style transfer in the image processing model according to the first model parameters.
[0232] In some embodiments of this application, the image generation apparatus 800 further includes:
[0233] The fifth acquisition module is used to acquire, after the image conversion requirement information also includes the style abstraction level, and after acquiring the first image and the image conversion requirement information, the second model parameter associated with the style abstraction level according to the association information between the preset style abstraction level and the preset model parameter;
[0234] The second adjustment module is used to adjust the model parameters related to the level of style abstraction in the image processing model according to the second model parameters.
[0235] In some embodiments of this application, the fusion module 830 described above can be specifically used for:
[0236] Obtain the portrait region in the global style image that corresponds to the portrait region image;
[0237] For each pixel in the portrait region, the target pixel value is determined based on the first pixel value in the portrait style image and the second pixel value in the global style image.
[0238] The target pixel value of each pixel is mapped to the corresponding position in the portrait area to generate a second image.
[0239] In some embodiments of this application, the fusion module 830 may specifically be used for:
[0240] For each pixel in the portrait region, the first pixel value and the second pixel value are weighted and summed according to the third weight of the first pixel value and the fourth weight of the second pixel value to obtain the target pixel value of the pixel.
[0241] In some embodiments of this application, the fusion module 830 may specifically be used for:
[0242] For each pixel in the portrait area, the relative distance between the pixel and the center point is determined based on the pixel's location information and the location information of the center point of the portrait area.
[0243] Based on the correlation information between the preset relative distance and the preset weight, the weight corresponding to the relative distance is determined as the third weight;
[0244] The weight corresponding to the third weight is determined as the fourth weight.
[0245] The various modules of the image generation apparatus 800 provided in this application embodiment can realize Figures 1 to 14 The image generation method 800 provides the functionality for each step and achieves the corresponding technical effects, which will not be elaborated here for the sake of brevity.
[0246] Figure 16 The diagram shows a schematic representation of the hardware structure of an electronic device provided in some embodiments of this application.
[0247] The electronic device may include a processor 901 and a memory 902 storing computer program instructions.
[0248] Specifically, the processor 901 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this application.
[0249] Memory 902 may include mass storage for data or instructions. For example, and not limitingly, memory 902 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 902 may include removable or non-removable (or fixed) media. Where appropriate, memory 902 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 902 is non-volatile solid-state memory.
[0250] In a particular embodiment, memory 902 may include read-only memory (ROM), random access memory (RAM), disk storage media device, optical storage media device, flash memory device, electrical, optical, or other physical / tangible memory storage device. Therefore, typically, memory 902 includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it is operable to perform the operations described with reference to the data processing method according to the first aspect of this application.
[0251] The processor 901 implements any of the image generation methods described in the above embodiments by reading and executing computer program instructions stored in the memory 902.
[0252] In one example, the electronic device may also include a communication interface 909 and a bus 910. Wherein, as... Figure 9 As shown, the processor 901, memory 902, and communication interface 909 are connected through bus 910 and complete communication with each other.
[0253] The communication interface 909 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this application.
[0254] Bus 910 includes hardware, software, or both, that couples components of an electronic device together. For example, and not limitingly, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 910 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, this application contemplates any suitable bus or interconnect.
[0255] The electronic device can execute the image generation method in the embodiments of this application, thereby achieving the combination Figures 1 to 15 The image generation method and apparatus described herein.
[0256] Furthermore, in conjunction with the image generation methods in the above embodiments, this application embodiment can provide a computer-readable storage medium for implementation. This computer-readable storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the image generation methods in the above embodiments. Examples of computer-readable storage media include non-transitory computer-readable storage media, such as portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, etc.
[0257] Furthermore, in conjunction with the image generation methods in the above embodiments, this application embodiment can provide a computer program product for implementation. This program product is stored in a storage medium and may specifically include a computer program or instructions. When executed by a processor, the computer program or instructions implement any of the image generation methods in the above embodiments. This program product is executed by at least one processor to implement the various processes as described in the above data processing method embodiments, and can achieve the same technical effects. To avoid repetition, further details are omitted here.
[0258] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.
[0259] The functional blocks shown in the above block diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.
[0260] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.
[0261] The aspects of this disclosure have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by special-purpose hardware performing the specified functions or actions, or can be implemented by a combination of special-purpose hardware and computer instructions.
[0262] The above are merely specific embodiments of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the protection scope of this application.
Claims
1. An image generation method, characterized in that, include: Obtain a first image and image conversion requirement information, wherein the first image includes a human portrait region image; The style reference information and the global image information of the first image are processed by the global transformation network of the image processing model to obtain a global style image. The style reference information and the portrait image information of the portrait region image are processed by the portrait transformation network of the image processing model to obtain a portrait style image. The style reference information is determined based on the image transformation requirement information. The global style image and the portrait style image are fused together using the fusion network of the image processing model to obtain a second image; The image conversion requirement information includes the degree of style conversion and the degree of style abstraction. The degree of style conversion refers to the extent to which the first image changes its original style and approaches the target style corresponding to the style reference information when the style conversion is performed. The degree of style abstraction refers to the extent to which the style of the second image deviates from representational realism and tends towards abstract expression. After obtaining the first image and image conversion requirement information, the method further includes: Based on the correlation information between the preset style conversion degree and the preset model parameters, obtain the first model parameters associated with the style conversion degree; adjust the model parameters in the image processing model related to the style conversion degree according to the first model parameters; Based on the correlation information between the preset style abstraction level and the preset model parameters, obtain the second model parameters associated with the style abstraction level; adjust the model parameters in the image processing model that are related to the style abstraction level according to the second model parameters.
2. The method according to claim 1, characterized in that, The method further includes: The image processing network of the image processing model processes the first image and the image transformation requirement information to obtain the style reference information, the global image information of the first image, and the portrait image information of the portrait region image.
3. The method according to claim 2, characterized in that, The image conversion requirement information includes style requirement information, the global image information includes a global sampled image and a global noise image, and the portrait image information includes a portrait sampled image and a portrait noise image; The image processing network of the image processing model processes the first image and the image transformation requirement information to obtain the style reference information, the global image information of the first image, and the portrait image information of the portrait region image, including: The first image and the image transformation requirement information are input into the image processing network. The style requirement information is processed by the image processing network to obtain the style reference information. The first image is then sampled to obtain the global sampled image. Extract the portrait sample image from the global sample image; The global sampled image is filled with noise using the style reference information to obtain the global noise image. Extract the human portrait noise image from the global noise image.
4. The method according to claim 3, characterized in that, The style requirement information includes style conversion type; the processing of the style requirement information to obtain the style reference information includes: Based on the association information between the preset style conversion type and the preset style feature vector, the first style feature vector associated with the style conversion type is determined as the style reference information.
5. The method according to claim 3, characterized in that, The style requirement information includes a style reference image; the process of processing the style requirement information to obtain the style reference information includes: The style reference image is compressed according to a preset size to obtain a style compressed image; The style-compressed image is vectorized to obtain a second style feature vector; The second style feature vector is determined as the style reference information.
6. The method according to claim 3, characterized in that, Extracting the portrait sample image from the global sample image includes: The global sampled image is subjected to grayscale conversion to obtain a global grayscale image; Based on preset human portrait feature information, human portrait detection is performed on the global grayscale image to determine the human portrait region image; Based on the portrait region image, the portrait sampling image is extracted from the global sampling image.
7. The method according to claim 3, characterized in that, The step of performing noise filling processing on the global sampled image using the style reference information to obtain the global noise image includes: Generate a blank image corresponding to the global sampled image based on the size and channel information of the global sampled image; The initial pixel values of the blank image are adjusted based on reference noise parameters to obtain the global noise image. The reference noise parameters are determined based on user feedback information.
8. The method according to claim 7, characterized in that, The user feedback information includes a first user feedback value; after fusing the global style image and the portrait style image through the fusion network of the image processing model to obtain a second image, the method further includes: The second image is displayed in the display area of the style conversion interface, which includes a rating option to prompt the user to quantify and score the conversion effect of the second image. In response to a first input to the rating option, obtain a first user feedback value corresponding to the rating option; The reference noise parameter is updated based on the first user feedback value.
9. The method according to claim 7 or 8, characterized in that, The user feedback information includes a second user feedback value; after fusing the global style image and the portrait style image through the fusion network of the image processing model to obtain the second image, the method further includes: The second image is displayed in the display area of the style conversion interface, which includes a download option to prompt the user to download the second image. In response to a second input to the download option, obtain a second user feedback value corresponding to the download option; The reference noise parameter is updated based on the second user feedback value.
10. The method according to claim 9, characterized in that, After fusing the global style image and the portrait style image through the fusion network of the image processing model to obtain the second image, the method further includes: When the style conversion interface includes a rating option and a download option, obtain the first user feedback value corresponding to the rating option and the second user feedback value corresponding to the download option; Based on the first weight of the first user feedback value and the second weight of the second user feedback value, the first user feedback value and the second user feedback value are weighted and summed to obtain the target feedback value; The reference noise parameter is updated based on the target feedback value.
11. The method according to any one of claims 1 to 8, characterized in that, The process of fusing the global style image and the portrait style image to obtain a second image includes: Obtain the portrait region in the global style image that corresponds to the portrait region image; For each pixel in the portrait region, the target pixel value is determined based on the first pixel value of the pixel in the portrait style image and the second pixel value in the global style image. The target pixel value of each pixel is mapped to the corresponding position in the portrait area to generate the second image.
12. The method according to claim 11, characterized in that, For each pixel in the portrait region, determining the target pixel value based on the pixel's first pixel value in the portrait-style image and its second pixel value in the global-style image includes: For each pixel in the portrait region, the relative distance between the pixel and the center point is determined based on the pixel's position information and the center point's position information. Based on the correlation information between the preset relative distance and the preset weight, the weight corresponding to the relative distance is determined as the third weight of the first pixel value; The weight corresponding to the third weight is determined as the fourth weight of the second pixel value; Based on the third weight of the first pixel value and the fourth weight of the second pixel value, the first pixel value and the second pixel value are weighted and summed to obtain the target pixel value of the pixel.