Image generation method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating the material and color information of objects in an indoor scene with the lighting information, and using an image generation model to directly generate scene rendering images, the problem of inefficient indoor lighting design in existing technologies is solved, and fast and low-cost lighting control is achieved.

CN117710523BActive Publication Date: 2026-06-19HANGZHOU QUNHE INFORMATION TECHNOLOGIES CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HANGZHOU QUNHE INFORMATION TECHNOLOGIES CO LTD
Filing Date: 2023-12-19
Publication Date: 2026-06-19

Application Information

Patent Timeline

19 Dec 2023

Application

19 Jun 2026

Publication

CN117710523B

IPC: G06T11/40; G06T5/50; G06N3/0464

AI Tagging

Application Domain

Image enhancement Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing indoor lighting designs rely on manual adjustments and 3D scene rendering, resulting in low efficiency and dependence on designer experience, making it impossible to quickly achieve lighting control in different scenarios.

Method used

By acquiring the first layer of material and color information of objects in the indoor scene and the second layer of lighting information, feature fusion is performed using an image generation model in deep learning to directly generate a scene rendering image that maintains consistent lighting.

Benefits of technology

It significantly improves image generation efficiency, reduces resource consumption, quickly processes lighting control in any scene, and lowers the barrier for users to adjust lighting.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117710523B_ABST

Patent Text Reader

Abstract

This disclosure provides an image generation method, apparatus, electronic device, and storage medium. This disclosure relates to the field of artificial intelligence technology, particularly to computer vision and image processing. The specific solution involves: acquiring a first layer and a second layer of an original image; wherein the first layer contains material and color information of objects in an indoor scene; the second layer includes lighting information of the indoor scene; feature fusion is performed on the first and second layers to obtain fused features; a scene rendering image of the original image is generated based on the fused features, wherein the lighting information of the scene rendering image is consistent with the lighting information of the second layer. According to the solution of this disclosure, 3D scene image processing can be skipped, and the image generation model can be used directly to complete the rendering processing of 2D scene images, greatly reducing resource consumption and improving the speed of processing lighting control for arbitrary scenes, thereby improving the efficiency of image generation.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of artificial intelligence technology, and in particular to the fields of computer vision and image processing. Background Technology

[0002] Modern interior design places great emphasis on and relies heavily on lighting design, demanding higher standards for both functionality and aesthetics. Different lighting methods for different times and scenarios in life are called scene modes. Current lighting design requires manual work, which is inefficient; moreover, the design effect is highly dependent on the designer's experience and skill, making it very time-consuming and labor-intensive. The specific implementation relies heavily on the various layers in the three-dimensional (3D) scene and is entirely dependent on the graphics processing unit (GPU) for rendering, resulting in slow image generation speeds. Summary of the Invention

[0003] This disclosure provides an image generation method, apparatus, electronic device, and storage medium.

[0004] According to a first aspect of this disclosure, an image generation method is provided, comprising:

[0005] Obtain the first and second layers of the original image; the first layer contains the material and color information of objects in the indoor scene; the second layer contains the lighting information of the indoor scene.

[0006] The first and second layers are fused to obtain the fused features;

[0007] A scene rendering image is generated based on the fusion features of the original image, and the lighting information of the scene rendering image is consistent with the lighting information of the second layer.

[0008] According to a second aspect of this disclosure, an image generation apparatus is provided, comprising:

[0009] The acquisition module is used to acquire the first and second layers of the original image; wherein, the first layer contains the material and color information of objects in the indoor scene; and the second layer includes the lighting information of the indoor scene.

[0010] The fusion module is used to fuse features from the first layer and the second layer to obtain fused features.

[0011] The first generation module is used to generate a scene rendering image of the original image based on the fusion features. The lighting information of the scene rendering image is consistent with the lighting information of the second layer.

[0012] According to a third aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform any of the methods described in the embodiments of this disclosure.

[0013] According to a fourth aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform any of the methods according to embodiments of this disclosure.

[0014] According to a fifth aspect of this disclosure, a computer program product is provided, including a computer program stored on a storage medium, which, when executed by a processor, implements any of the methods according to embodiments of this disclosure.

[0015] According to the technical solution disclosed herein, compared with the traditional image processing method of transitioning from 3D scene to two-dimensional (2D) scene, it can skip 3D scene image processing and directly use the image generation model to complete the rendering processing of 2D scene image, which greatly reduces resource consumption, improves the speed of processing lighting control of any scene, and thus improves the efficiency of image generation.

[0016] The above overview is for illustrative purposes only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of this application will become readily apparent from the accompanying drawings and the following detailed description. Attached Figure Description

[0017] In the accompanying drawings, unless otherwise specified, the same reference numerals throughout the various drawings denote the same or similar parts or elements. These drawings are not necessarily drawn to scale. It should be understood that these drawings depict only some embodiments disclosed in this application and should not be construed as limiting the scope of this application.

[0018] Figure 1 This is a schematic flowchart of an image generation method according to an embodiment of the present disclosure;

[0019] Figure 2 This is a schematic diagram of a frame generated from an image according to an embodiment of the present disclosure;

[0020] Figure 3 This is a schematic diagram of the process of obtaining a scene rendering image according to an embodiment of the present disclosure;

[0021] Figure 4 This is a schematic diagram of the architecture of an image generation model according to an embodiment of the present disclosure;

[0022] Figure 5 This is a schematic diagram illustrating the processing of the first and second layers of the original image according to an embodiment of the present disclosure;

[0023] Figure 6 This is a schematic diagram of the structure of an image generation apparatus 600 according to an embodiment of the present disclosure;

[0024] Figure 7 This is a scene illustration of an image generation method according to an embodiment of the present disclosure;

[0025] Figure 8 This is a structural block diagram of an electronic device according to an embodiment of the present disclosure. Detailed Implementation

[0026] The present disclosure will now be described in further detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.

[0027] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.

[0028] In this document, the term "and / or" indicates that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. The term "at least one" in this document indicates any combination of at least two of a plurality of elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C. The terms "first" and "second" in this document refer to and distinguish between multiple similar technical terms, and do not imply a specific order or a limitation to only two. For example, "first feature" and "second feature" refer to two categories / two features; the first feature can be one or more, and the second feature can also be one or more.

[0029] In related technologies, modern interior design places increasing emphasis on and reliance on lighting design, demanding higher standards for both functionality and aesthetics. Good lighting design can create different atmospheres within a home, enhancing the user's living experience. Different lighting methods for different times and scenarios in life are called scene modes. Under different scene modes, the location and specific type of light fixtures remain unchanged; only the brightness and color temperature of each fixture change. Essentially, a scene mode is a set of parameter configurations for indoor lighting fixtures. Switching between different scene modes is simply switching from one set of parameter configurations to another. Typically, adjustable parameters for light fixtures include brightness and color temperature.

[0030] In related technologies, at the logical design layer, existing lighting designs are manually performed by designers, primarily adjusting the on / off state of light points, the color temperature of the lights, and the light intensity. At the physical implementation layer, specific scene modes are obtained by the GPU rendering multiple layers in the 3D scene. These multiple layers include scene-invariant layers and scene-variable layers. The scene-invariant layers include: diffuse layer, emissive layer, ambient light layer, and refract layer; the scene-variable layer includes: direct light layer.

[0031] In existing technologies, lighting design needs to be done manually, resulting in low efficiency. Furthermore, the design effect is highly dependent on the designer's experience and skill, making the lighting design process very time-consuming and labor-intensive. The specific implementation of existing lighting designs relies on various layers in the 3D scene and is entirely dependent on GPU rendering, resulting in low image generation efficiency.

[0032] To at least partially address one or more of the aforementioned problems and other potential issues, this disclosure proposes an image generation method for lighting control in arbitrary scenes. By utilizing image generation models from deep learning to directly generate scene rendering images, it can quickly process lighting control in arbitrary scenes, thus improving image generation efficiency. Furthermore, compared to traditional image processing methods that transition from 3D to 2D scenes, this method skips the 3D scene and directly transitions from 2D to 2D, significantly reducing resource consumption. The use of image generation models to directly generate scene rendering images related to lighting effect control in 2D scenes greatly increases the output speed of scene rendering images.

[0033] This disclosure provides an image generation method. Figure 1This is a schematic flowchart of an image generation method according to an embodiment of the present disclosure. This image generation method can be applied to an image generation apparatus. The image generation apparatus is located in an electronic device. The electronic device includes, but is not limited to, mobile phones, tablet computers, laptops, desktop computers, etc. In some possible implementations, the image generation method can also be implemented by a processor calling computer-readable instructions stored in memory. For example... Figure 1 As shown, the image generation method includes:

[0034] S101: Obtain the first and second layers of the original image; wherein, the first layer contains material and color information of objects in the indoor scene; the second layer includes lighting information of the indoor scene;

[0035] S102: Perform feature fusion on the first layer and the second layer to obtain the fused features;

[0036] S103: Generate a scene rendering image based on the fusion features of the original image. The lighting information of the scene rendering image is consistent with the lighting information of the second layer.

[0037] In this embodiment of the disclosure, the original image can be a 2D scene image of different interior design scenarios or at different times. For example, the scene image can be a "bedroom scene image at 9:00 PM"; it can also be a "green plant scene image next to the sofa scene image at 7:00 PM"; it can also be a "bedroom scene image with the main light on and auxiliary lights off at 6:00 PM"; it can also be a "bedroom scene image with the main light off and auxiliary lights on at 4:00 PM"; it can also be a "bedroom scene image with the main light off and auxiliary lights off but the balcony light on at 8:00 PM"; and it can also be a "bedroom scene image with the main light on, auxiliary lights on, and balcony light on at 9:00 PM". The above are merely illustrative examples and are not intended to limit the original image to all possible scenarios; they are simply not exhaustive.

[0038] In this embodiment of the disclosure, if the original image is a "scenario image of a bedroom at 21:00", the original image can be obtained by taking a picture of the bedroom at 21:00 with a camera; it can also be obtained by predicting based on the parameters of various components in the home decoration design scheme; or it can be obtained by simulating the parameters of various components in the home decoration design scheme. The above are merely illustrative examples and are not intended to limit all possible ways of obtaining the original image; they are simply not exhaustive.

[0039] In this embodiment of the disclosure, the layer can be understood as a transparent "sheet of paper," and each "sheet of paper" can contain different images, text, shapes, or other visual elements. These layers are stacked together in a top-to-bottom order to form a complete visual effect.

[0040] In this embodiment, the first layer refers to a scene-invariant layer, namely a Diffuse layer. The Diffuse layer is a layer type in image editing software, used to describe the effects of light scattering and diffuse reflection. In image editing software, layers are tools for organizing and managing image elements, and can be used to create complex image effects. Diffuse layers are typically placed above other layers and can control the color, brightness, and transparency of other layers. By adjusting the properties of the Diffuse layer, the appearance and texture of the image can be changed, creating more realistic and vivid image effects.

[0041] In this embodiment, the second layer can refer to a scene-changing layer, namely a Direct-light layer. This Direct-light layer is used to simulate direct lighting effects. It can simulate light directly illuminating an object's surface and creating a contrast between light and shadow. By adjusting the properties of the Direct-light layer, the position, intensity, and color of the direct light source can be controlled, thereby adding localized highlights and shadows to the object's surface. The Direct-light layer is particularly important when creating scenes with strong lighting effects, as it enhances the texture and three-dimensionality of the scene.

[0042] In this embodiment of the disclosure, the indoor scene can refer to multiple scenes in a home decoration design scheme. For example, the indoor scene can be a bedroom, sofa, living room, etc.; it can also be a bed in the bedroom, green plants in the living room, decorative items in the living room, etc. The above is merely an illustrative example and is not intended to limit all possible scenarios included in the indoor scene; it is simply not an exhaustive list.

[0043] In this embodiment, the material information refers to the color of the object itself. It can be a single color, or a texture map can be used to represent the surface color of the object for the diffuse channel. This material information mainly refers to the object's inherent characteristics such as color, texture, and grain. These characteristics can be represented using a diffuse map, including the material properties and the marks left on the object over time. In simpler terms, it represents whether the object is made of metal, wood, or some other material, as well as the object's color and the stains, scratches, rust, etc., left by the passage of time.

[0044] In this embodiment of the disclosure, the color information refers to the inherent color of the object itself or the color effect of the object under normal lighting conditions. For example, if the object is a pothos plant, the color information of the object is green; if the object is a beige sofa, the color information of the object is beige.

[0045] In this embodiment of the disclosure, the color information of the object may include color, brightness, saturation, warmth or coolness, spatial relationship, hue, color properties, and emotional attributes. These attributes can be used to describe and distinguish objects of different colors and the emotions and effects they express.

[0046] In this embodiment of the disclosure, the lighting information may include: the on / off state of the light source, the color temperature, illuminance, position, direction, color, intensity, and shadows of the light source. This lighting information can be used to simulate and represent the lighting effects produced when light shines on an object, as well as to create atmosphere and represent the texture of a scene. In computer graphics and virtual reality technology, lighting information is typically calculated and implemented through a rendering engine.

[0047] In this embodiment of the disclosure, the fused feature includes: a first feature obtained based on a first layer and a second feature obtained based on a second layer. The first feature may be obtained by encoding the first layer using a ControlNet (CN) model. The second feature may be obtained by encoding the second layer using the CN model.

[0048] In this embodiment of the disclosure, the scene rendering image is a form of expression that conveys emotions and atmosphere through visual means. The depiction of the scene is crucial in scene rendering images. This includes detailed portrayal of elements such as environment, background, lighting, and color to create a specific atmosphere and emotion. For example, elements such as a cozy bedroom, a bright living room, and a vibrant balcony can convey a feeling of security and happiness; while elements such as a dimly lit bedroom, a softly lit living room, and a tranquil balcony can convey a feeling of peace and comfort.

[0049] Figure 2 A schematic diagram of the image generation framework is shown, such as Figure 2 As shown, the original image is processed through neural rendering to obtain a first layer and a second layer. The first layer is a diffuse layer, and the second layer is a direct lighting layer. The first and second layers of the original image are then input into a control network to obtain the first feature of the first layer and the second feature of the second layer. The first and second features are then input into a Stable Diffusion (SD) model to obtain the scene rendering image.

[0050] Figure 3 The flowchart illustrating the process of acquiring scene rendering images is shown, such as... Figure 3As shown, the first layer and the second layer of the original image are obtained. The first layer is the diffuse reflection layer of the original image, which can include the objects in the original image and their material and color information. The second layer is the direct lighting layer of the original image, which is divided into four types: second layer 1, second layer 2, second layer 3, and second layer 4. Among them, second layer 1 is the direct lighting layer for turning on the bedroom wall lamp (mode 1), second layer 2 is the direct lighting layer for turning on the living room main light and the bedroom door, second layer 3 is the direct lighting layer for turning on the balcony light, and second layer 4 is the direct lighting layer for turning on the bedroom wall lamp (mode 2). The color temperature value of bedroom wall lamp mode 1 is lower than that of bedroom wall lamp mode 2. Feature fusion of the first layer and the second layer 1 yields scene rendering image 1, which is used to display the lighting results of the bedroom wall lamp in mode 1. Feature fusion of the first layer and the second layer 2 yields scene rendering image 2, which is used to display the lighting results of the living room main light and the bedroom door. Feature fusion of the first layer and the second layer 3 yields scene rendering image 3, which is used to display the lighting results of the balcony light. Feature fusion of the first layer and the second layer 4 yields scene rendering image 4, which is used to display the lighting results of the bedroom wall lamp in mode 2.

[0051] The technical solution of this disclosure involves obtaining a first layer and a second layer of an original image. The first layer contains material and color information of objects in the indoor scene, while the second layer includes lighting information of the indoor scene. Feature fusion is performed on the first and second layers to obtain fused features. A scene rendering image of the original image is generated based on the fused features, ensuring that the lighting information of the scene rendering image is consistent with the lighting information of the second layer. This allows users to upload any 2D scene image and directly use an image generation model to generate a scene rendering image, significantly accelerating the image output speed. Since the scene rendering image generated by the image generation model is consistent with the lighting information of the second layer, the image generation model can quickly process lighting control for any scene, helping to improve image generation efficiency. Compared to traditional image processing methods for transitioning from 3D to 2D scenes, this disclosure abandons traditional rendering methods, greatly reducing resource consumption and returning the corresponding lighting control effect image in a short time, significantly lowering the barrier for users to adjust lighting.

[0052] In some embodiments, the feature fusion of the first layer and the second layer to obtain fused features includes: inputting the first layer and the second layer into a CN model in an image generation model, the image generation model being a pre-trained model; and using the CN model to perform feature fusion of the first layer and the second layer to obtain fused features.

[0053] In this embodiment, the image generation model may include a CN model and an SD model. In image processing, the CN model is a neural network model used for image editing and image generation tasks. Unlike traditional Generative Adversarial Networks (GANs), the CN model allows for fine-grained control over the generated image by controlling various editing operations on the input image. In image processing, the SD model is a latent diffusion model. The SD model uses a U-Net neural network structure with cross-attention layers that modulate the output of text embeddings to generate images from noise. The SD model can also be used for tasks such as unconditional image generation, image inpainting, image super-resolution, category-conditional image generation, text-image generation, and layout-conditional image generation.

[0054] Figure 4 A schematic diagram of the image generation model architecture is shown, such as... Figure 4 As shown, the image processing model includes a CN model and a SD model. The CN model replicates the network structure of the SD model. The parameters of the CN model are trainable and optimizable, while the parameters of the SD model are fixed. Through the transmission of residual information, the learned control information is transferred to the SD model, thereby achieving excellent control capabilities. The SD model is a generative model trained on a large-scale image and text dataset, possessing powerful generative capabilities.

[0055] In this embodiment of the disclosure, the image generation model can be a pre-trained model. The training method of the image generation model may include: acquiring multiple sample images, first and second layers; inputting the first and second layers of the sample images into the model to be trained; and adjusting the model according to preset model parameters and learning rate to obtain a trained image generation model.

[0056] The technical solution of this embodiment involves inputting the first and second layers into the CN model of the image generation model; using the CN model to fuse the features of the first and second layers to obtain fused features; and using the SD model to process the fused features to obtain a scene rendering image. In this way, the powerful generation capabilities of the SD model and the excellent conditional control capabilities of the CN model can be utilized to improve the image generation effect of lighting control in different scenes.

[0057] In some embodiments, the feature fusion of the first layer and the second layer using the CN model to obtain fused features includes: encoding the first layer using a preset encoder in the CN model to obtain a first feature of the first layer; encoding the second layer using the preset encoder to obtain a second feature of the second layer; and fusing the first feature of the first layer and the second feature of the second layer to obtain fused features.

[0058] It should be noted that the preset encoder can process the first layer before processing the second layer, and the preset encoder can also process the first layer after processing the second layer.

[0059] In this embodiment of the disclosure, the preset encoder in the CN model refers to a neural network model used to encode the input signal into an intermediate representation. The preset encoder in the CN model can be any pre-trained encoder model, such as a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN).

[0060] In this embodiment of the disclosure, the first feature may include color, brightness, texture, feel, ambient light, shadow, transparency, refraction, and detail enhancement. The diffuse layer can change the overall color and brightness of the image, simulate the texture and feel of the image surface, simulate the effect of ambient light on the image, and the generation of shadows, etc.

[0061] In this embodiment of the disclosure, the second feature may include illumination direction, intensity, shadow, highlight, reflection, refraction, ambient light, diffuse reflection, transparency, and smoke effects. The diffuse layer can simulate the effect of transparent materials, the effect of smoke, the influence of ambient light on the image, the effect of diffuse reflection, and the effects of reflection and refraction, etc.

[0062] The technical solution of this disclosure embodiment uses a preset encoder in the CN model to encode the first layer to obtain a first feature of the first layer; it also uses the preset encoder in the CN model to encode the second layer to obtain a second feature of the second layer; and it fuses the first feature of the first layer and the second feature of the second layer to obtain a fused feature. In this way, under multi-condition control, the image generation control effect can be improved by using only the CN model, which helps to reduce storage costs.

[0063] In some embodiments, the image generation method may further include: encoding the third layer using a preset encoder to obtain a third feature of the third layer; and adjusting the fusion feature based on the third feature to obtain a new fusion feature.

[0064] In this embodiment, the third layer can be the third layer of the original image or a third layer from a template library. This third layer is used for adaptive optimization and adjustment of lighting information. The second and third layers can be layers with different styles. In some implementations, after fusing the first feature of the first layer and the second feature of the second layer to obtain a fused feature, the method further includes: adjusting the fused feature based on the third feature to obtain a new fused feature. Here, the new fused feature incorporates the third feature of the third layer, adaptively optimizing and adjusting the lighting information of the second layer, thereby making the scene rendering image output by the SD model based on the new fused feature more beautiful and realistic.

[0065] For example, the third layer can be stored in a third layer template library, which stores multiple third layers. In some implementations, a first layer and a third layer of the original image are obtained. The first layer may include material and color information of objects in the indoor scene, and the third layer includes desired lighting information in the indoor scene. Feature fusion is performed on the first and third layers to obtain fused features. A scene rendering image of the original image is generated based on the fused features, and the scene rendering image is consistent with the desired lighting information of the third layer. In this way, a scene rendering image can be generated by combining the third layer without using a second layer, enriching the diversity of scene rendering images.

[0066] The technical solution of this embodiment utilizes a preset encoder to encode the third layer, obtaining a third feature of the third layer; based on the third feature, the fused feature is adjusted to obtain a new fused feature. Thus, by incorporating the third feature of the third layer, the lighting information of the second layer is adaptively optimized and adjusted, improving the diversity of lighting control in any scene.

[0067] In some embodiments, generating a scene rendering image of the original image based on fusion features includes: inputting the fusion features into an SD model in an image generation model, the image generation model being a pre-trained model; and using the SD model to perform rendering processing on the fusion features to obtain a scene rendering image of the original image.

[0068] Figure 5 The diagram illustrates the processing of the first and second layers of the original image, as shown below. Figure 5As shown, in existing technologies, the CN model first encodes the input content after receiving it, converting the image space into implicit features, before feeding it into subsequent networks for learning. This solution improves upon existing technologies by inputting the first and second layers as multiple conditions into the CN model, performing an addition operation on the encoded features, and finally feeding them into subsequent networks. Specifically:

[0069] Enc1=Cond_Encoder(Diffuse);

[0070] Enc2=Cond_Encoder(Direct-Light);

[0071] Enc = Enc1 + Enc2;

[0072] Here, Enc1 represents the first feature, Enc2 represents the second feature, and Cond_Encoder represents the preprocessor. When encoding the first and second layers, the parameters of Cond_Encoder remain unchanged.

[0073] Here, multiple features are added together for two reasons: first, to avoid excessive modification to the original structure of the CN model, which could affect its performance; and second, to support the arbitrary addition of other conditions later.

[0074] In this embodiment, the original CN model only supports single-condition input control. When facing multiple-condition control, the number of CN models must be increased, meaning the number of conditions is linearly related to the number of CN models. This is because the CN model directly copies the structure of the SD model. Therefore, it incurs high resource costs, requiring approximately 1.4 gigabytes (GB) of storage space. For multiple-condition control, this strategy of using multiple CN models would result in significant resource consumption and resource usage. Therefore, this method proposes to implement fused control of multiple conditions within a single CN model, where these multiple conditions include first-layer containment information and second-layer containment information.

[0075] The technical solution of this disclosure embodiment involves inputting fused features into the SD model of the image generation model; using the SD model to render the fused features, a scene-rendered image of the original image is obtained. Thus, by improving the original CN model, the resource cost of image generation can be reduced. By liberating lighting control from the traditional rendering mode to an artificial intelligence generation mode, resource consumption is greatly reduced and the image output speed is accelerated.

[0076] In some embodiments, the image generation method further includes: generating a first layer of the original image and a plurality of candidate second layers based on the original image; outputting the first layer of the original image and the plurality of candidate second layers; and determining a second layer to be synthesized from the plurality of candidate second layers based on a selection operation for the plurality of candidate second layers.

[0077] In this embodiment, the multiple candidate second layers can be extracted using existing layer extraction techniques. Subsequently, new layers can be created according to user needs, and existing templates in the template library can be improved and optimized based on actual requirements.

[0078] In this embodiment of the disclosure, the template used by the user is obtained to generate a user self-portrait; when the user inputs the original image, one first layer and M candidate second layers are extracted based on the original image. The system automatically selects a second layer that matches the user self-portrait from the M candidate second layers according to the user self-portrait, so that the final generated scene rendering image is consistent with the lighting control effect of the second layer.

[0079] The technical solution of this disclosure involves generating a first layer of the original image and multiple candidate second layers based on the original image; outputting the first layer of the original image and multiple candidate second layers; and determining the second layer based on the selection operation of the multiple candidate second layers. In this way, a second layer that can present a better lighting control effect can be selected from multiple candidate second layers according to user needs and actual requirements, which helps to improve the image generation effect and thus improve the user experience.

[0080] In some embodiments, the image generation method further includes: if the indoor scene includes multiple lights, then the second layer is the layer corresponding to any combination of the multiple lights being in different on / off states, different color temperature values, and different light intensity values.

[0081] In this embodiment of the disclosure, the lighting fixture may include a chandelier, ceiling light, wall light, table lamp, floor lamp, downlight, spotlight, etc.

[0082] In this embodiment of the disclosure, the multiple lights being in different switching states may include: when the switch of the light is in the "on" state, the light emits light; when the switch of the light is in the "off" state, the light does not emit light.

[0083] In this embodiment of the disclosure, the color temperature value is a scale representing the color of light from a light source. If the color temperature is higher, the color is more bluish and thus a cool color tone; if the color temperature is lower, the color is more reddish and thus a warm color tone.

[0084] In the technical solution of this disclosure embodiment, if the indoor scene includes multiple lighting fixtures, the second layer is the layer corresponding to any combination of the multiple lighting fixtures in different on / off states, different color temperatures, and different light intensity values. This helps to evolve lighting control from traditional rendering methods to artificial intelligence (AI), thereby improving the reliability and quality of image generation and ultimately promoting technological development and change.

[0085] It should be understood that Figure 2 , Figure 3 , Figure 4 and Figure 5 The schematic diagrams shown are merely illustrative and not limiting, and are scalable; those skilled in the art can use them as a basis. Figure 2 , Figure 3 , Figure 4 and Figure 5 Even with various obvious changes and / or substitutions to the examples, the resulting technical solutions still fall within the scope of this disclosure.

[0086] This disclosure provides an image generation apparatus 600, such as... Figure 6 As shown, the image generation device includes: an acquisition module 610, used to acquire a first layer and a second layer of the original image; wherein, the first layer contains material information and color information of objects in the indoor scene; and the second layer contains lighting information of the indoor scene.

[0087] The fusion module 620 is used to fuse features of the first layer and the second layer to obtain fused features.

[0088] The first generation module 630 is used to generate a scene rendering image of the original image based on the fusion features, and the lighting information of the scene rendering image is consistent with the lighting information of the second layer.

[0089] In some embodiments, the fusion module 620 includes: a first input submodule for inputting the first layer and the second layer into the control network (CN) model in the image generation model, wherein the image generation model is a pre-trained model; and a fusion submodule for using the CN model to perform feature fusion on the first layer and the second layer to obtain fused features.

[0090] In some embodiments, the fusion submodule is configured to: encode the first layer using a preset encoder in the CN model to obtain a first feature of the first layer; encode the second layer using a preset encoder in the CN model to obtain a second feature of the second layer; and fuse the first feature of the first layer and the second feature of the second layer to obtain a fused feature.

[0091] In some embodiments, the fusion module 620 includes: a second input submodule for inputting the third layer into the control network (CN) model in the image generation model; the fusion submodule is further configured to encode the third layer using the preset encoder to obtain a third feature of the third layer; and to adjust the fusion feature based on the third feature to obtain a new fusion feature.

[0092] In some embodiments, the first generation module includes: a third input submodule, used to input the fused features into the SD model in the image generation model, wherein the image generation model is a pre-trained model; and a rendering submodule, used to use the SD model to render the fused features to obtain a scene rendering image of the original image.

[0093] In some embodiments, the image generating apparatus further includes: a second generating module ( Figure 6 (not shown in the image), used to generate a first layer of the original image and multiple candidate second layers based on the original image; output module ( Figure 6 (not shown in the image), used to output the first layer of the original image and multiple candidate second layers; determination module ( Figure 6 (Not shown in the image), used to determine the second layer based on the selection operation for multiple candidate second layers.

[0094] In some embodiments, the image generation apparatus further includes: if the indoor scene includes multiple lighting lamps, then the second layer is the layer corresponding to any combination of the multiple lighting lamps being in different on / off states, different color temperature values, and different light intensity values.

[0095] Those skilled in the art should understand that the functions of each processing module in the image generation apparatus of this disclosure can be understood with reference to the relevant description of the foregoing image generation method. Each processing module in the image generation apparatus of this disclosure can be implemented by an analog circuit that implements the functions described in the embodiments of this disclosure, or by running software that performs the functions described in the embodiments of this disclosure on an electronic device.

[0096] The image generation apparatus of this disclosure can quickly process lighting control in any scene, which helps to improve image generation efficiency. Compared to traditional image processing methods that transition from 3D to 2D scenes, this direct transition from 2D to 2D scene significantly reduces resource consumption and storage space pressure. It allows users to upload any scene image and returns the corresponding lighting control effect image in a short time, greatly lowering the barrier for users to adjust lighting.

[0097] This disclosure provides a scenario illustration of an image generation method, such as... Figure 7As shown. As previously described, the image generation method provided in this disclosure is applied to an electronic device. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices.

[0098] Specifically, the electronic device may perform the following operations:

[0099] Obtain the first and second layers of the original image; the first layer contains the material and color information of objects in the indoor scene; the second layer contains the lighting information of the indoor scene.

[0100] The first and second layers are fused to obtain the fused features;

[0101] A scene rendering image is generated based on the fusion features of the original image, and the lighting information of the scene rendering image is consistent with the lighting information of the second layer.

[0102] The original image can be obtained from a data source. The data source can be various forms of data storage devices, such as laptops, desktop computers, workstations, personal digital assistants (PDAs), servers, blade servers, mainframes, and other suitable computers. The data source can also represent various forms of mobile devices, such as PDAs, cellular phones, smartphones, wearable devices, and other similar computing devices. Furthermore, the data source and the user terminal can be the same device.

[0103] It should be understood that Figure 7 The scene diagrams shown are merely illustrative and not restrictive; those skilled in the art can interpret them based on... Figure 7 Even with various obvious changes and / or substitutions to the examples, the resulting technical solutions still fall within the scope of this disclosure.

[0104] The acquisition, storage, and application of personal information of the target object involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.

[0105] Figure 8 This is a structural block diagram of an electronic device according to an embodiment of the present disclosure. Figure 8As shown, the electronic device includes a memory 810 and a processor 820. The memory 810 stores a computer program that can run on the processor 820. The number of memories 810 and processors 820 can be one or more. The memory 810 can store one or more computer programs, which, when executed by the electronic device, cause the electronic device to perform the methods provided in the above-described method embodiments. The electronic device may also include a communication interface 830 for communicating with external devices and performing data exchange and transmission.

[0106] If the memory 810, processor 820, and communication interface 830 are implemented independently, they can be interconnected via a bus to communicate with each other. This bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 8 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0107] Optionally, in a specific implementation, if the memory 810, processor 820, and communication interface 830 are integrated on a single chip, then the memory 810, processor 820, and communication interface 830 can communicate with each other through an internal interface.

[0108] It should be understood that the aforementioned processor can be a Central Processing Unit (CPU), or other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. General-purpose processors can be microprocessors or any conventional processor. It is worth noting that the processor can be a processor supporting Advanced Reduced Instruction Set Machines (ARM) architecture.

[0109] Further, optionally, the aforementioned memory may include read-only memory and random access memory, and may also include non-volatile random access memory. The memory may be volatile or non-volatile, or may include both. Non-volatile memory may include read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory may include random access memory (RAM), which serves as an external cache. Many forms of RAM are available by way of example, but not limitation. Examples include Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous DRAM (SDRAM), Double Data Rate Synchronous DRAM (DDR SDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct RAMBUS RAM (DR RAM).

[0110] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. A computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this disclosure is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line, DSL) or wireless (e.g., infrared, Bluetooth, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access, or a data storage device such as a server or data center that integrates one or more available media. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., Digital Versatile Discs (DVDs)), or semiconductor media (e.g., Solid State Disks (SSDs)). It is worth noting that the computer-readable storage media mentioned in this disclosure can be non-volatile storage media; in other words, they can be non-transient storage media.

[0111] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware, or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0112] In the description of the embodiments of this disclosure, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this disclosure. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of those different embodiments or examples.

[0113] In the description of the embodiments disclosed herein, unless otherwise stated, " / " means "or". For example, A / B can mean A or B. The "and / or" in this document is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone.

[0114] In the description of embodiments of this disclosure, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of embodiments of this disclosure, unless otherwise stated, "a plurality of" means two or more.

[0115] The above are merely exemplary embodiments of this disclosure and are not intended to limit this disclosure. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the protection scope of this disclosure.

Claims

1. An image generation method, characterized in that, include: Obtain the first and second layers of the original image; wherein, the first layer is a diffuse layer, containing material and color information of objects in the indoor scene; the second layer is a direct lighting layer, containing lighting information of the indoor scene; The first layer and the second layer are input into the control network (CN) model in the image generation model, which is a pre-trained model. The first layer is encoded using a preset encoder in the CN model to obtain the first feature of the first layer, and the second layer is encoded to obtain the second feature of the second layer; The CN model is used to fuse the features of the first layer and the second layer to obtain fused features; wherein the feature fusion is achieved by adding the first feature and the second feature. The fused features are input into the stable diffusion (SD) model in the image generation model, which is a pre-trained model; The SD model is used to render the fused features to obtain a scene rendering image of the original image; the lighting information of the scene rendering image is consistent with the lighting information of the second layer. The image generation model skips 3D scene rendering and directly generates scene rendering images based on 2D layers. The method further includes: Generate a first layer and multiple candidate second layers based on the original image; Output the first layer and multiple candidate second layers of the original image; A user profile is generated based on the user's template information, and a second layer matching the user profile is selected from the multiple candidate second layers based on the selection operation for the multiple candidate second layers.

2. The method according to claim 1, characterized in that, The method further includes: The third layer is encoded using the preset encoder to obtain the third feature of the third layer; The fusion feature is adjusted based on the third feature to obtain a new fusion feature.

3. The method according to claim 1, characterized in that, The method further includes: If the indoor scene includes multiple lights, then the second layer is the layer corresponding to any combination of the multiple lights being in different on / off states, different color temperatures, and different light intensity values.

4. An image generation apparatus, characterized in that, include: The acquisition module is used to acquire the first layer and the second layer of the original image; wherein, the first layer is a diffuse layer, which contains the material information and color information of objects in the indoor scene; the second layer is a direct lighting layer, which includes the lighting information of the indoor scene. A fusion module is used to input the first layer and the second layer into the control network (CN) model in the image generation model, wherein the image generation model is a pre-trained model; to encode the first layer using a preset encoder in the CN model to obtain a first feature of the first layer, and to encode the second layer to obtain a second feature of the second layer; to fuse the features of the first layer and the second layer using the CN model to obtain fused features; wherein the feature fusion is achieved by adding the first feature and the second feature; The first generation module is used to input the fused features into the stable diffusion (SD) model in the image generation model, wherein the image generation model is a pre-trained model; and to use the SD model to render the fused features to obtain a scene rendering image of the original image; the lighting information of the scene rendering image is consistent with the lighting information of the second layer. The image generation model skips 3D scene rendering and directly generates scene rendering images based on 2D layers. The device further includes: The second generation module is used to generate a first layer of the original image and multiple candidate second layers based on the original image; The output module is used to output the first layer and multiple candidate second layers of the original image; The determination module is used to generate a user profile based on the user's template information, and select a second layer that matches the user profile from the multiple candidate second layers based on the selection operation for the multiple candidate second layers.

5. An electronic device, characterized in that, include: At least one processor; as well as A memory that is communicatively connected to the at least one processor; The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

6. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-3.

7. A computer program product comprising a computer program stored on a storage medium, characterized in that, The computer program, when executed by a processor, implements the method according to any one of claims 1-3.