Image generation method, apparatus, device, and storage medium
By converting the original reference image into a 3D model and interactively adjusting the viewpoint, a 2D reference image is generated as input to the AI model, solving the problem of insufficient viewpoint control in existing technologies and achieving efficient and intuitive image generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU BOGUAN TELECOMM TECH LTD
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-12
AI Technical Summary
Existing AI image generation technology is insufficient in terms of intuitive control over the camera's perspective, making it difficult for users to accurately achieve the desired composition effect. The operation process is cumbersome and requires high technical skills.
By converting the original reference image into a 3D model, users can interactively adjust the rendering perspective and generate a 2D reference image as input for the AI model, which is then combined with text prompts to generate the target image.
It achieves efficient and intuitive perspective control, reduces the number of repeated generation times, improves the accuracy of composition control and generation efficiency, and the generated images are highly consistent with user expectations.
Smart Images

Figure CN122199730A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer technology, and in particular to an image generation method, apparatus, device, and storage medium. Background Technology
[0002] In the field of artificial intelligence image generation technology, text-to-image generation methods have become an important tool for concept design, game art creation, and product prototype demonstration. Related solutions, by combining text prompts with reference images, provide a basic implementation path for controlling the theme and style of the generated content, thus satisfying users' basic descriptive needs for image content to a certain extent.
[0003] As application scenarios become more diverse, users are demanding higher precision in image composition and control efficiency. This is particularly true in professional applications requiring precise control of camera angles, where related technical solutions face new challenges in terms of intuitiveness and operational efficiency. Currently, mainstream solutions still rely on textual description optimization or pre-generated reference images using third-party modeling software, methods that present certain barriers in terms of operational procedures and technical requirements.
[0004] Among related technologies, existing AI image generation technologies have significant shortcomings in terms of intuitive control over the camera's perspective, making it difficult for users to accurately achieve the desired composition effect. Summary of the Invention
[0005] Therefore, it is necessary to provide an image generation method, apparatus, device, and storage medium that can accurately meet the user's expectations in order to address the above-mentioned technical problems.
[0006] In a first aspect, this disclosure provides an image generation method, the method comprising:
[0007] Obtain the original reference image and convert it into a 3D model; In response to commands that adjust the viewpoint of the 3D model, update the rendering viewpoint of the 3D model. In response to the angle confirmation command, a 2D reference image is generated based on the rendering viewpoint; The 2D reference image and the original reference image are used as combined input conditions and fed into the AI image generation model to generate the target image.
[0008] In one embodiment, converting the original reference image into a 3D model includes: In response to the 3D generation command, a 3D model corresponding to the original reference image is generated and displayed in the preview window.
[0009] In one embodiment, updating the rendering perspective of the 3D model in response to a viewpoint adjustment command for the 3D model includes: In response to a command to adjust the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint.
[0010] In one embodiment, in response to a viewpoint adjustment command on the 3D model displayed in the preview window, the rotation angle of the 3D model is updated to update the rendering viewpoint, including: In response to the viewpoint adjustment command, the corresponding viewpoint transformation parameters are calculated, and the rotation angle of the virtual camera viewpoint is updated according to the viewpoint transformation parameters to update the rendering viewpoint of the 3D model in the preview window.
[0011] In one embodiment, in response to an angle confirmation command, a 2D reference image is generated based on the rendering viewpoint, including: Capture the 3D model rendering screen displayed in the preview window under the current rendering view, and convert the captured screen into a 2D reference image.
[0012] In one embodiment, generating a 3D model corresponding to the original reference image includes: Send the original reference image to the 3D model service and receive the low-poly 3D untextured model returned by the 3D model service.
[0013] In one embodiment, obtaining the original reference image of the viewpoint to be adjusted includes: In response to an image addition command executed in the design tool plugin, retrieve the image to be processed that has been added to the canvas; In response to the selected image node on the canvas, determine the original reference image.
[0014] In one embodiment, in response to a selected image node on the canvas, determining the original reference image includes: In response to selecting an image node on the canvas, the type attribute of the selected node is determined through the design tool plugin; If the type attribute is image type, then the image contained in the node is determined as the original reference image.
[0015] In one embodiment, the method further includes: During the process of converting the original reference image into a 3D model, a loading status indicator showing the conversion progress is displayed in the preview window.
[0016] In one embodiment, the interactive view adjustment command includes: Rotate the 3D model by dragging with the mouse and / or by touch gestures.
[0017] In one embodiment, using the 2D reference image and the original reference image as combined input conditions, the method further includes: Receive input text prompts; The original reference image, the 2D reference image, and the text prompt are used together as input conditions.
[0018] In one embodiment, the original reference image, the 2D reference image, and the text prompt are used together as input conditions, including: Generate text describing the current rendering viewpoint; The text describing the current rendering perspective is merged with text prompts to form optimized prompts; The original reference image and the optimized prompt words are used together as input conditions.
[0019] In one embodiment, after generating the target image, the process further includes: In the design tool plugin, replace the original reference image with the target image.
[0020] In one embodiment, updating the rendering perspective of the 3D model includes: In response to the selection command of the preset viewpoint, the rendering viewpoint of the 3D model is updated in the preview window according to the preset viewpoint.
[0021] Secondly, this disclosure also provides an image generation apparatus, the apparatus comprising: The original image conversion module is used to acquire the original reference image and convert it into a 3D model; The perspective interaction module is used to update the rendering perspective of the 3D model in response to perspective adjustment commands for interacting with the 3D model. Angle confirmation module, used to generate a 2D reference image based on the rendering viewpoint in response to an angle confirmation command; The target image generation module is used to take the 2D reference image and the original reference image as combined input conditions and input them into the AI image generation model to generate the target image.
[0022] Thirdly, this disclosure also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method described above.
[0023] Fourthly, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described method.
[0024] Fifthly, this disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the above-described method.
[0025] The image generation method, apparatus, device, and storage medium disclosed herein, by constructing a complete interactive 3D perspective adjustment workflow, converts 2D format original reference images into 3D models, providing users with an interactive image spatial structure foundation. This allows users to intuitively understand the three-dimensional composition of image content, fundamentally changing the vague operation method of imagining spatial relationships solely through text. Through real-time interactive adjustment and instant visual feedback of the rendering perspective, users can directly explore and fine-tune the perspective, dynamically approaching the user's ideal composition, reducing the operation process of blind generation and repeated trial and error. By solidifying the confirmed 3D rendering perspective, a 2D reference image can be effectively generated, thus achieving a lossless conversion from interactive operation to precise visual conditions, providing a perspective basis for AI model generation. Finally, by fusing multimodal inputs of the original reference image and the 2D reference image, the AI model can simultaneously maintain content consistency and perspective accuracy, ultimately generating a target image that highly matches the user's expectations. Compared with related technologies, the image generation method provided in this disclosure can reduce the user's operating threshold by precisely controlling the process from user intent input to accurate visual result output. This disclosure avoids AI models guessing based on vague prompt text and provides direct control through visualization and interactive 3D perspective adjustment. This not only improves the accuracy of composition control but also significantly reduces the number of repeated generation steps, improves the efficiency of AI image generation, and achieves efficient, intuitive, and accurate generation of the expected composition effect. Attached Figure Description
[0026] To more clearly illustrate the technical solutions in the embodiments or related technologies of this disclosure, the accompanying drawings used in the description of the embodiments or related technologies of this disclosure will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0027] Figure 1 This is a schematic flowchart of an image generation method provided in an embodiment of the present disclosure; Figure 2 This is a schematic diagram illustrating the generation and display of a low-poly 3D textureless model, as provided in an embodiment of this disclosure. Figure 3 A flowchart illustrating a step for acquiring an original reference image, provided in an embodiment of this disclosure; Figure 4 An illustration of inserting an original reference image provided in an embodiment of this disclosure; Figure 5 A schematic diagram illustrating the adjustment of the angle of a 3D model according to an embodiment of this disclosure; Figure 6 A schematic diagram illustrating the generation of an angle reference image provided in an embodiment of this disclosure; Figure 7 A schematic diagram of a perspective adjustment provided in an embodiment of this disclosure; Figure 8 This is a schematic diagram illustrating an image-filled canvas generation method provided in an embodiment of this disclosure. Figure 9 This is a schematic diagram of the structure of an image generation apparatus provided in an embodiment of the present disclosure; Figure 10 This is a schematic diagram of the internal structure of a computer device provided in an embodiment of the present disclosure. Detailed Implementation
[0028] To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this disclosure.
[0029] When using AI image generation technology for concept design, game art asset creation, or product prototype demonstration, users often need to control the content and style of the generated images through text prompts. However, relying solely on text descriptions makes it difficult to precisely control the camera angle of the generated images (such as specific angles like overhead, low-angle, or side views), often leading to discrepancies between the generated results and the design intent. This necessitates repeated generation and filtering, resulting in low efficiency.
[0030] Currently, although some AI-generated image tools support generation based on existing images, they still lack an intuitive and interactive control mechanism for camera angles. Users cannot directly preview and adjust the target perspective; they can only try different prompts or use third-party 3D software to pre-create angle reference images. However, third-party 3D software generally does not have readily available models with high richness to allow users to quickly generate angle reference images, and sometimes even requires remodeling. The entire process is cumbersome and technically demanding.
[0031] In one exemplary embodiment, Figure 1 This is a flowchart illustrating an image generation method provided in an embodiment of the present disclosure, as shown below. Figure 1 As shown, an image generation method is provided. This example illustrates the method's application to a terminal. It is understood that this method can also be applied to a server, or to a system including both a terminal and a server, and is implemented through interaction between the terminal and the server. The terminal can be, but is not limited to, various personal computers, laptops, smartphones, and tablets. The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services. In this embodiment, the method includes the following steps S101 to S104. Wherein: S101. Obtain the original reference image and convert it into a 3D model.
[0032] In this context, the original reference image refers to a 2D image provided by the user whose compositional perspective needs further adjustment. This original reference image can come from various sources; for example, it could be an AI-generated image or design prototype with the perspective to be adjusted, a real-world photograph of a person or scene, or a hand-drawn concept sketch or line drawing by a designer. Their common characteristic is that they contain visual content from which the user wishes to adjust the perspective. The 3D model refers to a simplified 3D structure generated through image reconstruction technology, such as a low-poly white model, which can be used for interactive perspective adjustment. The terminal can refer to the computing device executing the method, such as a personal computer or mobile device, which can handle user interaction and local processing. The server can refer to remote computing resources used to provide 3D model conversion or AI-generated image services.
[0033] For example, users can add images to a canvas using a design tool plugin (such as Figma, a design tool). The plugin automatically identifies image nodes and triggers the conversion process. The terminal or server can invoke an AI-based image-to-3D service (such as using a neural network model) to convert the original reference image into an interactive 3D white model. This 3D model can retain the basic geometry of the original reference image but remove texture details to optimize rendering performance.
[0034] Optionally, users can select an image node in the design tool canvas and then click the "Generate 3D Model" button in the plugin. The terminal will send the image data to the cloud 3D model service, which will return low-polygon 3D model data and render and display it in real time in the terminal's preview window.
[0035] By converting 2D images into 3D models, a visual 3D foundation is provided for users, enabling them to intuitively understand the spatial structure of the image. This lays the foundation for subsequent perspective adjustments and solves the problem of difficulty in accurately controlling the composition perspective based solely on text descriptions.
[0036] S102. In response to the viewpoint adjustment command for interacting with the 3D model, update the rendering viewpoint of the 3D model.
[0037] Among them, the viewpoint adjustment command refers to the interactive signal issued by the user through an input device (such as a mouse or touch screen), which can be used to change the viewing angle of the 3D model.
[0038] For example, users can interact with 3D models by dragging with the mouse (such as rotating with the left mouse button and panning with the middle mouse button) or by touch gestures (such as rotating with one finger and zooming with two fingers). The terminal captures these perspective adjustment operation commands in real time, calculates the corresponding camera matrix changes, updates the rendered screen in the preview window, and provides smooth perspective feedback.
[0039] As an example, the plugin can embed a 3D graphics rendering engine, such as a WebGL (Web Graphics Library)-based 3D renderer, listen to the user's mouse events, dynamically adjust the model's rotation angle based on the drag distance and direction, and immediately update the display in the preview window, allowing the user to explore the model from any angle.
[0040] With real-time interactive perspective adjustment, users can directly preview the appearance of the model from different angles, avoiding blindly guessing the effect of the perspective, thus enabling them to more accurately achieve the expected composition and avoiding the shortcomings of unintuitive perspective control and high operation threshold.
[0041] S103. In response to the angle confirmation command, generate a 2D reference image based on the rendering viewpoint.
[0042] The "angle confirmation" command refers to the user triggering the viewpoint lock operation, such as clicking the "Confirm Angle" button. The 2D reference image can be a two-dimensional image captured from the current rendering viewpoint, serving as a viewpoint reference for the AI-generated image. The capture process can refer to using a graphics API to capture the current view of the preview window.
[0043] For example, after adjusting to a satisfactory viewing angle, the user can trigger an angle confirmation command in various ways, including but not limited to: clicking the confirmation button in the preview window, pressing the Enter key on the keyboard, double-clicking the 3D model, or saying a specific voice command such as "confirm angle". In response to this command, the terminal invokes the renderer's screenshot function (such as the Canvas.toDataURL method) to generate a high-resolution 2D reference image (such as in PNG format), which can save the appearance of the 3D model from the current viewing angle.
[0044] Optionally, after the user confirms the angle, the plugin can automatically capture the content of the preview window and save it as a 2D reference image. It can also attach and save metadata (such as camera parameters) for precise control of the subsequent AI image generation process.
[0045] By generating 2D reference images, the user's interactively adjusted perspective is concretized into a visual reference, providing accurate perspective conditions for AI-generated images and ensuring that the generated images conform to the user's expectations. Figure 1 Compared to text prompts, this allows for a more precise description of the perspective.
[0046] S104. The 2D reference image and the original reference image are used as combined input conditions and input into the AI image generation model to generate the target image.
[0047] The combined input conditions refer to integrating multimodal data (such as images and text) into the input of the AI model. The AI image generation model can refer to a generation model that supports image cues, such as Stable Diffusion and Midjourney. The target image can refer to the generated image; the content of the target image can be the same as the original reference image, but the viewpoint may differ.
[0048] Optionally, the terminal or server can employ various strategies to combine the original reference image, the 2D reference image, and optional text prompts into input conditions that the AI model can understand. In one embodiment, two images are stitched side-by-side into a single wide image before being input. In another embodiment, the depth feature vectors of the two images are calculated separately and then weighted and averaged for fusion. These combined conditions collectively constitute constraints on the generated content: the original reference image is used to maintain consistency in the main content, the 2D reference image serves as a visual conditional image to precisely specify the target viewpoint, and the text prompts are used to refine the style or background. By calling an AI image generation service (such as Stable Diffusion, which supports image prompts) via an API, the AI model generates a new image based on the viewpoint composition information contained in the 2D reference image, thereby outputting a new image with a viewpoint matching it.
[0049] For example, the terminal or server packages the original reference image, the 2D reference image, and optional text prompts into a multimodal request, and calls the AI image generation service through the API. The service generates a new image based on these conditions, wherein the viewpoint matches the 2D reference image.
[0050] Optionally, the plugin can construct a structured data request (e.g., a JSON request body) containing the encodings of the original reference image and the 2D reference image (e.g., Base64 encoding), as well as the fused prompt words, send the request body to an AI service interface (such as the DreamStudio API), and receive the generated target image data.
[0051] By using multimodal input conditions, the AI model can simultaneously reference the original content and the user's adjusted perspective to generate images that meet the expected composition, reducing the number of repeated trials and errors, improving generation efficiency and accuracy, and avoiding compositional deviations caused by inaccurate perspective control.
[0052] In this embodiment, by constructing a complete interactive 3D perspective adjustment workflow, and converting the original 2D reference image into a 3D model, an interactive image spatial structure foundation is provided for the user. This allows the user to intuitively understand the three-dimensional composition of the image content, fundamentally changing the vague operation method of imagining spatial relationships based solely on text. Through real-time interactive adjustment and instant visual feedback of the rendering perspective, the user can directly explore and fine-tune the perspective, dynamically approaching the user's ideal composition, reducing the operation process of blind generation and repeated trial and error. By solidifying the confirmed 3D rendering perspective, a 2D reference image can be effectively generated. This achieves a lossless conversion from interactive operation to precise visual conditions, providing a perspective basis for AI model generation. Finally, by fusing the multimodal input of the original reference image and the 2D reference image, the AI model can simultaneously maintain content consistency and perspective accuracy, ultimately generating a target image that highly matches the user's expectations. Compared with related technologies, the image generation method provided in this disclosure can reduce the user's operating threshold by precisely controlling the process from user intent input to accurate visual result output. This disclosure avoids AI models guessing based on vague prompt text and provides direct control through visualization and interactive 3D perspective adjustment. This not only improves the accuracy of composition control but also significantly reduces the number of repeated generation steps, improves the efficiency of AI image generation, and achieves efficient, intuitive, and accurate generation of the expected composition effect.
[0053] In one exemplary embodiment, converting the original reference image into a 3D model includes: In response to the 3D generation command, a 3D model corresponding to the original reference image is generated and displayed in the preview window.
[0054] Here, "3D generation command" refers to a user-triggered action to generate a 3D model, such as clicking a specific button. "Preview window" refers to a graphical display area embedded in a plugin or application, used for real-time rendering of the 3D model. "Rendering perspective" refers to the position and orientation parameters of the virtual camera in 3D space, determining the user's viewing angle of the model. "Preview window display" refers to rendering and displaying the 3D model within a graphical interface.
[0055] For example, after a user inputs the "Generate 3D Model" command through at least one method, including but not limited to: the Generate 3D Model button, keyboard shortcuts, voice input, etc., the plugin can start the conversion process, initialize the 3D rendering environment in the preview window, and display the 3D model after loading the generated model data.
[0056] As an example, the plugin can create a 3D scene using libraries such as Three.js, load the converted model data as a mesh object, set the initial lighting and camera position, and render the 3D model in real time in the preview window.
[0057] In this embodiment, 3D models can be generated and displayed instantly with one click, allowing users to quickly enter the interactive adjustment stage, simplifying the operation process and improving the real-time nature and intuitiveness of composition control.
[0058] In an exemplary embodiment, updating the rendering view of the 3D model in response to a viewpoint adjustment command includes: In response to a command to adjust the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint.
[0059] The rotation angle refers to the parameters (such as Euler angles) by which a 3D model rotates around the X, Y, and / or Z axes in a three-dimensional coordinate system. It can be used to characterize changes in the model's orientation. The rendering viewpoint corresponds to the viewing direction of the virtual camera in the 3D scene. When the rotation angle of the 3D model changes, the image in the rendering window updates accordingly.
[0060] For example, users can interact with the 3D model in the preview window by dragging with the mouse, swiping with the touch, or pressing keyboard keys. The terminal captures the user's interaction in real time and calculates the rotation increment of the 3D model on the X and Y axes based on the displacement of the input device or the direction of the operation, cumulatively updating the model's rotation angle parameters. After each rotation angle update, the rendering engine immediately redraws the preview window based on the new model posture, allowing users to intuitively see the effect of the viewpoint change and achieving a WYSIWYG interactive experience.
[0061] As an example, in a WebGL renderer implemented using Three.js, the plugin controls rotation by listening to mouse events. When the user holds down the left mouse button and moves horizontally, the system records the pixel displacement difference (deltaX) and calculates the rotation increment around the Y-axis based on a preset rotation sensitivity coefficient (e.g., 0.5 degrees rotation for every 1 pixel movement). When the mouse moves vertically, the rotation increment around the X-axis is calculated. The system applies the accumulated rotation increment to the model's Euler angles and calls the `object.rotation.set` method to update the model's transformation matrix. The renderer then automatically redraws the scene in the next frame. To maintain smooth rotation, the terminal can update frame-by-frame with each mouse movement event, rather than calculating the final result only when the mouse is released.
[0062] Optionally, in addition to incremental rotation based on mouse drag, the absolute rotation angle of the model can be set directly through preset view buttons (such as front view and top view); or a precise rotation angle value can be input through a slider control; for touch screen devices, a multi-touch interaction method can be used, such as single-finger swipe to rotate the model and two-finger rotation to adjust zoom simultaneously. In another implementation scheme, the terminal can keep the model stationary while rotating the camera (i.e., using a method where the camera rotates around the model), which can also achieve the visual effect of adjusting the viewpoint, while the internal implementation is reflected in updating the camera position rather than the model rotation angle.
[0063] In this embodiment, by directly mapping user interactions to changes in the rotation angle of the 3D model, an intuitive and smooth viewpoint control method is provided. Users do not need professional 3D software operation knowledge; they can freely explore various angles of the model simply by dragging or swiping, receiving real-time visual feedback and quickly approaching the ideal composition. This direct rotation control mechanism effectively avoids the indirectness of repeatedly trying to describe the viewpoint through text, thereby improving the efficiency and accuracy of viewpoint adjustment.
[0064] In one exemplary embodiment, in response to a viewpoint adjustment command on a 3D model displayed in a preview window, the rotation angle of the 3D model is updated to update the rendering viewpoint, including: In response to the viewpoint adjustment command, the corresponding viewpoint transformation parameters are calculated, and the rotation angle of the virtual camera viewpoint is updated according to the viewpoint transformation parameters to update the rendering viewpoint of the 3D model in the preview window.
[0065] The perspective transformation parameter refers to the change in camera angle calculated based on the displacement or operation direction of user interaction. This can include changes in the camera's pitch angle around the horizontal axis (X-axis), yaw angle around the vertical axis (Y-axis), and the distance between the camera and the model's center point. The virtual camera perspective refers to the position coordinates and orientation parameters of the virtual camera in the 3D scene, determining the viewing angle when the scene is rendered to the preview window. The rendering perspective refers to the visual effect corresponding to the final image presented in the preview window.
[0066] For example, when a user adjusts the viewing angle in the preview window by dragging with the mouse or using touch gestures, the terminal captures the user's interactive displacement in real time and converts the displacement into angle change parameters for the virtual camera based on preset mapping rules. The terminal then recalculates the camera's new position and orientation in 3D space based on the current camera position and orientation, combined with the user-inputted transformation parameters, and subsequently updates the camera matrix in the rendering engine. The rendering engine redraws the scene based on the updated camera parameters, allowing the user to observe a continuous change in the model's viewing angle within the image, achieving an interactive experience of panoramic or top-down observation.
[0067] As an example, in a WebGL renderer based on Three.js, the plugin uses the OrbitControls library to update the camera's viewpoint. When the user holds down the left mouse button and moves horizontally in the preview window, OrbitControls listens for mouse movement events, calculates the change in yaw angle corresponding to the mouse displacement, and adds this change to the current camera's azimuth angle. When the mouse moves vertically, it calculates the change in pitch angle and limits the pitch angle to a preset range (e.g., between -85 degrees and 85 degrees) to prevent the camera from flipping. After each angle change, the controller calls the `camera.position.set` method to recalculate the camera's position in space according to the spherical coordinate system formulas: (x = radius × sin(yaw) × cos(pitch), y = radius × sin(pitch), z = radius × cos(yaw) × cos(pitch)), and calls the `camera.lookAt` method to ensure the camera is always focused on the model's center point. This process continues with each frame of mouse movement, ensuring real-time and smooth viewpoint adjustments.
[0068] Alternatively, besides rotating the camera around the 3D model, the 3D model can rotate while the camera remains stationary. Furthermore, the calculation of the viewpoint transformation parameters can support different interaction modes: for example, touchscreen devices can rotate the camera with a single finger swipe and zoom in / out on the distance between the camera and the model with a two-finger pinch; for keyboard operation, the azimuth or pitch angle of the camera can be adjusted incrementally using the directional keys in fixed increments (e.g., 5 degrees at a time). In another implementation, the historical state of the camera transformation can be recorded, allowing users to undo operations to return to a previous viewpoint or restore the default viewing angle with a single click using the reset viewpoint button.
[0069] In this embodiment, by mapping user interactions to updates in the rotation angle of the virtual camera's viewpoint, a model-centric surround viewing effect is achieved. Compared to directly rotating the model, this camera viewpoint update method better aligns with the cognitive habits of observing objects in the real world, allowing users to more intuitively understand the relationship between the current viewpoint and 3D space. Simultaneously, the parameterized adjustment of the camera facilitates precise control of the observation angle, supports recording and reusing preset viewpoints, and provides stable and reproducible viewpoint conditions for subsequent 2D reference image generation. This interactive mechanism effectively lowers the barrier to spatial imagination for users and improves the flexibility and accuracy of viewpoint adjustments.
[0070] In one exemplary embodiment, in response to an angle confirmation command, a 2D reference image is generated based on the rendering viewpoint, including: Capture the 3D model rendering screen displayed in the preview window under the current rendering view, and convert the captured screen into a 2D reference image.
[0071] The current rendering viewpoint refers to the position, orientation, and field of view parameters of the virtual camera in the preview window at the current moment, which determines the angle and composition of the 3D model in the image. Capturing refers to capturing the pixel data of the current frame in the rendering buffer through a graphics programming interface (such as the Canvas API or WebGL readPixels).
[0072] For example, after adjusting the 3D model to a satisfactory viewing angle in the preview window, the user can issue an angle confirmation command through various interaction methods, including but not limited to clicking the confirm angle adjustment button in the plugin interface, using keyboard shortcuts (such as pressing the Enter key or space bar), double-clicking the 3D model, clicking the confirmation floating window outside the preview window, inputting specific commands such as "confirm angle" or "lock view" via voice, performing specific gestures on a touchscreen device (such as long-pressing with two fingers or clicking with three fingers), or using designated buttons on an external input device (such as a game controller). Upon receiving the angle confirmation command, the terminal can invoke the renderer's image capture function, read the rendering buffer data of the current frame, convert the rendering buffer data into image data of a specified format (e.g., a Base64-encoded PNG image), and save the image as a 2D reference image. The generated 2D reference image can be temporarily stored in memory for subsequent steps or simultaneously displayed in the plugin interface for user preview and confirmation.
[0073] As an example, in a WebGL renderer based on Three.js and HTML Canvas, the plugin captures the image as follows: When the user triggers an angle confirmation command, the terminal first obtains the Canvas element bound to the current WebGL renderer and calls the `toDataURL` method of the Canvas native API. This method encodes the current pixel content on the Canvas into a Base64 string in PNG format. To obtain a higher resolution reference image, the plugin can pre-set the Canvas size to a preset export resolution (e.g., 1024×1024 pixels), and then restore it to the preview window size after taking the screenshot, ensuring that the generated 2D reference image has the required clarity for the AI model's input. Subsequently, the plugin encapsulates the obtained Base64 data into an image file object and sends it, along with the original reference image, to the AI image generation service as part of the combined input conditions.
[0074] Alternatively, besides using the Canvas's toDataURL method, the WebGL's readPixels method can be used to directly read pixel data from the framebuffer and manually encode it to generate PNG or JPEG images. This approach provides more low-level pixel control, such as removing UI overlay elements or capturing only the 3D model rendering area when taking a screenshot. In another implementation, the plugin can use the current camera parameters (position, rotation angle, field of view, etc.) instead of the captured image as the viewpoint reference condition, and the AI model can directly parse the camera parameters to generate an image from the corresponding viewpoint. Furthermore, for applications requiring higher precision viewpoint control, both the screenshot and camera metadata can be saved simultaneously, enabling the AI model to more accurately understand the viewpoint intent.
[0075] In this embodiment, by directly capturing the current frame of the preview window as a 2D reference image, a precise conversion from interactive perspective adjustment to visual conditional input is achieved. What the user sees is what they get; the adjusted perspective effect, immediately confirmed, can be losslessly transferred to the AI image generation model, avoiding ambiguity and bias that might arise from describing the perspective through text. This capture method is simple to operate and responds quickly, requiring no additional image export or format conversion skills from the user, significantly lowering the operational threshold for perspective control. Simultaneously, the generated 2D reference image, together with the original reference image, forms a multimodal input, providing the AI model with clear and accurate perspective constraints. This ensures that the final generated target image is highly consistent with the user's expectations in terms of composition angle, effectively reducing the workload of repeated generation and filtering.
[0076] In one exemplary embodiment, generating a 3D model corresponding to the original reference image includes: Send the original reference image to the 3D model service and receive the low-poly 3D untextured model returned by the 3D model service.
[0077] Converting original reference images into 3D models can be achieved in several ways. One approach relies on cloud-based 3D model services, which are AI services deployed on cloud servers that reconstruct 3D structures from 2D images using pre-trained deep learning models (such as PIFuHD). The terminal uploads the image via a network request, and the service processes it and returns the model data. Another approach is to perform real-time 3D reconstruction locally on the terminal using a built-in lightweight neural network model, suitable for latency-sensitive or offline scenarios. Low-polygon 3D textureless models refer to simplified models with significantly reduced triangle counts and no complex color or material textures, suitable for fast rendering and interaction.
[0078] By employing a low-polygon, textureless 3D model, the amount of 3D model data is significantly reduced, lowering the network bandwidth and time required to transmit the model from the cloud service to the terminal, thus improving the overall process response speed. Secondly, the simplified geometry allows the graphics processing unit (GPU) of terminal devices (especially those with limited performance) to render the model in real-time at a higher frame rate. This ensures smooth and immediate interaction when users drag and rotate the model to adjust the viewpoint, avoiding stuttering and providing a superior user experience.
[0079] For example, a terminal can upload an image to a 3D model service via an HTTP request. The service uses a pre-trained deep learning model (such as PIFuHD) to perform 3D reconstruction and returns a model file containing vertex and face data.
[0080] As an example, the plugin can call a custom REST API to send image data to a 3D reconstruction service deployed on an AWS EC2 instance. The 3D reconstruction service can return model files in OBJ or GLB format, which the plugin can then parse and load into a preview window.
[0081] Optionally, Figure 2 An illustration showing the generation and display of a low-poly 3D textureless model for this disclosure, such as... Figure 2 As shown, after clicking the "Generate 3D White Model" button, the plugin can call the image-to-3D model service to convert the original reference image into a low-polygon 3D white model, and render and display it in the plugin preview interface.
[0082] In this embodiment, processing complex 3D reconstruction through cloud services can reduce the computational burden on the terminal. At the same time, the low-poly model can ensure smooth interaction, enabling users to quickly adjust the perspective and accurately realize their compositional intentions.
[0083] In one exemplary embodiment, Figure 3 This is a flowchart illustrating a step for acquiring an original reference image, as provided in an embodiment of this disclosure. Figure 3 As shown, obtaining the original reference image from which the viewpoint to be adjusted can specifically include: S301. In response to an image addition command executed in the design tool plugin, obtain the image to be processed added to the canvas; S302. In response to the selected image node on the canvas, determine the original reference image.
[0084] In this context, "design tool plugin" refers to an extension program integrated into design software (such as Figma or Sketch). "Image adding command" refers to the user's action of dragging and dropping an image or importing it onto the canvas. "Image node" refers to an abstract object representing an image element within the design tool.
[0085] By implementing this disclosed embodiment as a plugin for design tools (such as Figma), the perspective adjustment function can be seamlessly integrated into the designer's existing workflow. Users no longer need to export images to standalone 3D software or AI image generation tools, nor do they need to switch and transfer files between different applications, thus completely eliminating the cumbersome operations and time-consuming processes associated with intermediate steps. This deep integration ensures the continuity of the design context (such as canvas layout and layer relationships), enabling a one-stop operation from perspective adjustment to result application, significantly improving the overall efficiency of concept design, iteration, and presentation.
[0086] For example, users can add images by dragging and dropping files onto the canvas or using the import menu. The plugin listens for canvas change events, automatically identifies newly added image nodes, and marks them as images to be processed.
[0087] As an example, in the Figma plugin, the figma.currentPage.selection API can be used to get the node selected by the user, filter out the image type nodes, and extract their image data as the original reference image.
[0088] Optional. Figure 4 An illustration of inserting an original reference image provided in an embodiment of this disclosure, such as... Figure 4 As shown, the user places the AI image whose angle needs to be adjusted into the Figma canvas and selects the image node through the plugin. In this embodiment, by deeply integrating design tools through plugins, users can start the perspective adjustment process without switching platforms, improving work efficiency and ensuring accurate acquisition of image data, thus providing a foundation for precise composition control.
[0089] In one exemplary embodiment, in response to an image node selected on the canvas, determining the original reference image includes: In response to selecting an image node on the canvas, the type attribute of the selected node is determined through the design tool plugin; If the type attribute is image type, then the image contained in the node is determined as the original reference image.
[0090] In this context, an image node refers to an abstract object representing an image element in the design tool's canvas. This object possesses specific type attributes, positional information, size parameters, and image fill data. The type attribute refers to a metadata field in the design tool used to identify the type of node, distinguishing between different element types such as image nodes, text nodes, vector graphics nodes, and group nodes.
[0091] For example, users can select images on the design tool's canvas using mouse clicks, selection boxes, keyboard-assisted selection (e.g., holding down the Shift key and clicking multiple nodes, holding down Ctrl / Cmd to select multiple nodes), the node list in the Layers panel, locating and selecting specific image nodes using search or filtering functions, selecting all image nodes in the current canvas with a single click using the plugin's "auto-recognition" function, selecting with a single-finger tap or long press gesture on a touchscreen device, or quickly locating and selecting an image node using an external input device (e.g., a stylus, keyboard shortcuts). The plugin then uses the design tool's API to obtain the type attribute of the currently selected node. If the node's type attribute is an image type (e.g., a RECTANGLE node in Figma with its fills attribute containing image fill), the plugin extracts image data from the node (e.g., the image's URL or Base64 encoding), identifies this image data as the original reference image for adjusting the viewpoint, and enables subsequent 3D transformation functions. If the selected node type is not an image type (e.g., a text node or a vector graphics node), the plugin provides a prompt message reminding the user to select a valid image node.
[0092] As an example, in the Figma plugin, the plugin listens for changes in the canvas selection state using `figma.currentPage.selection`. When the user selects a node, the plugin retrieves the node's `type` attribute for judgment. If the `type` is a shape node that supports image filling, such as `RECTANGLE` or `ELLIPSE`, the plugin further reads the node's `fills` array and checks if there is a fill item of type `IMAGE`. If so, the plugin obtains the unique identifier of the image using `fill.image.hash` and calls the `figma.getImageByHash` method to retrieve the image data, ultimately converting it to Base64 format and storing it as the original reference image. If the selected node is a container type, such as `GROUP` or `FRAME`, the plugin recursively traverses its child nodes, attempting to find and automatically select the first node that matches the image type condition.
[0093] Optionally, in addition to determining based on node type attributes, the plugin can also allow users to manually mark any node as a convertible node, or directly specify the original reference image by dragging and dropping an image file to the plugin panel. In another implementation, the plugin can automatically identify all image nodes in the canvas and present them as a thumbnail list in the plugin interface, allowing users to select the image for which they need to adjust the viewpoint, without requiring users to pre-select nodes in the canvas. Furthermore, for vector graphics nodes, the plugin can support rasterizing them into images before using them as the original reference image, thereby expanding the range of supported nodes.
[0094] In this embodiment, by automatically determining node type attributes and extracting image content, accurate identification and acquisition of image elements in the design tool are achieved. This automatic filtering mechanism based on node type avoids process interruptions or error messages caused by users accidentally selecting non-image nodes, improving the fault tolerance and smoothness of the operation. Simultaneously, the deep integration of the plugin with the design tool eliminates the need for additional import / export operations to acquire the original reference image; users only need to select the target image on the canvas to trigger the subsequent 3D conversion process, significantly simplifying the operation steps, maintaining the consistency and coherence of the design context, and laying a data foundation for an efficient and accurate perspective adjustment workflow.
[0095] In one exemplary embodiment, the method further includes: During the process of converting the original reference image into a 3D model, a loading status indicator showing the conversion progress is displayed in the preview window.
[0096] Loading status indicators can refer to visual feedback elements, such as progress bars, rotation animations, or text prompts, used to indicate the status of the transition process.
[0097] For example, after sending a conversion request, the plugin displays a dynamic loading icon or progress percentage in the preview window, and automatically updates to 3D model rendering after the conversion is complete.
[0098] As an example, the plugin uses HTML / CSS to create a progress bar component, which updates the display based on progress events returned by the service (such as "uploading" or "processing") to ensure that the user is aware of the current status.
[0099] In this embodiment, by providing real-time loading feedback, the uncertainty of user waiting is reduced, the user experience is enhanced, and users are kept informed of the conversion progress in a timely manner, avoiding operation interruption and improving the continuity of composition adjustment.
[0100] In one exemplary embodiment, the interactive view adjustment command includes: Rotate the 3D model by dragging with the mouse and / or by touch gestures.
[0101] Mouse dragging refers to the cursor movement operation of holding down the mouse button and moving it. Touch gestures refer to operations such as sliding and pinching fingers on a touchscreen.
[0102] For example, the plugin can listen for mouse or touch events, calculate the displacement and map it to the rotation angle of the 3D model, and update the rendered 3D model in real time.
[0103] As an example, in the renderer, you can use libraries such as OrbitControls to implement mouse drag-and-drop model rotation and support multi-touch gestures on touch devices.
[0104] Optionally, Figure 5 This is a schematic diagram illustrating the adjustment of the angle of a 3D model according to an embodiment of the present disclosure, such as... Figure 5 As shown, users can adjust the camera angle of the 3D white model in real time by dragging, and the plugin provides real-time feedback on the changes in perspective.
[0105] In this embodiment, through a natural and intuitive interaction method, users can easily adjust the viewing angle without the need for professional 3D operation knowledge, thus focusing more on the composition effect and achieving precise viewing angle control.
[0106] In one exemplary embodiment, generating a 2D reference image based on the current rendering viewpoint includes: Capture the current view of the preview window and generate a 2D reference image.
[0107] "Capturing the preview window" can refer to capturing the rendered image data using a graphics API. "Current view" can refer to the 3D model rendering result displayed in real time in the preview window.
[0108] For example, the plugin calls the renderer's DOM element capture method (such as canvas.toDataURL) to export the current visual state as an image file and save it as a 2D reference image.
[0109] As an example, in a WebGL renderer, you can use the readPixels method of WebGLRenderingContext to obtain pixel data and encode it into images in formats such as PNG and JPG.
[0110] Optionally, Figure 6 This is a schematic diagram illustrating the generation of an angle reference view provided in an embodiment of this disclosure, such as... Figure 6 As shown, after the user clicks the "Confirm Angle Adjustment" button, the plugin can capture a rendered image of the 3D model in the current viewport as a reference image for the viewpoint.
[0111] In this embodiment, high-fidelity 2D reference images can be quickly generated by capturing the rendered screen in real time, ensuring that the perspective information is accurately transmitted to the AI image model and improving the accuracy of composition control.
[0112] In one exemplary embodiment, using the 2D reference image and the original reference image as combined input conditions further includes: Receive input text prompts; The original reference image, the 2D reference image, and the text prompt are used together as input conditions.
[0113] Here, text prompts can refer to user-inputted text descriptions, which can be used to supplement the style or content details of the generated image. "Combined as input conditions" refers to concatenating or encoding multimodal data into an input format acceptable to the AI model.
[0114] For example, the plugin can provide a text input box for users to enter prompts, and encapsulate the text and image data together into a multimodal request, which is then sent to the AI-generated image service.
[0115] As an example, the plugin can construct a JSON request containing an array of images (original reference images and 2D reference images) and a prompt field (user prompt word), and send it via HTTP POST to the AI image generation programming interface.
[0116] Optionally, Figure 7 A schematic diagram of a perspective adjustment provided for an embodiment of this disclosure, such as... Figure 7 As shown, the original reference image, the angle reference image (2D reference image), and the user's prompt can be used as input to call the image generation AI model selected by the user to generate a new image that retains the original content but adjusts the perspective. In this embodiment, by combining image and text input, the AI model can more comprehensively understand the user's intent and generate images that meet both the perspective requirements and the content details, thereby improving the integrity and accuracy of the composition.
[0117] In one exemplary embodiment, the original reference image, the 2D reference image, and the text prompt are used together as input conditions, including: Generate text describing the current rendering viewpoint; The text describing the current rendering perspective is merged with text prompts to form optimized prompts; The original reference image and the optimized prompt words are used together as input conditions.
[0118] The text describing the current rendering viewpoint can be an automatically generated viewpoint description. Generating this description can be achieved through predefined rules. For example, based on calculated camera pitch and yaw angles, a predefined viewpoint name mapping table can be consulted (e.g., pitch > 60 degrees corresponds to "bird's-eye view," yaw angle around 45 degrees corresponds to "side-front view"), or a rule engine can combine parameters to generate a natural language description such as "slightly overhead side-front view." Fusion refers to concatenating or semantically integrating text to create richer prompts.
[0119] For example, the plugin has pre-defined viewpoint description mapping rules to generate natural language descriptions based on current camera parameters (such as pitch and yaw angles). As one implementation, the pitch angle of the virtual camera can be used for determination: when the pitch is greater than 30 degrees, it is described as a "top-down view"; when the pitch is between -30 and 30 degrees, it is described as a "level-on view"; and when the pitch is less than -30 degrees, it is described as a "bottom-up view". For instance, if the current pitch is 45 degrees, the description text "top-down view" is generated; if the current pitch is -10 degrees, the description text "level-on view" is generated. The plugin then appends this language description to the user-input prompt, forming an optimized prompt.
[0120] As an example, the plugin can use a rules engine to map camera angles to text (such as "top view"), combine it with the original prompt word to form "original prompt word, top view", and then input it along with the image into the AI model.
[0121] In this embodiment, by automatically generating perspective descriptions and optimizing prompts, the user's workload of manually writing text is reduced, while the accuracy of AI-generated image instructions is improved, ensuring that the generated image is highly consistent with the expected composition.
[0122] In one exemplary embodiment, after generating the target image, the process further includes: In the design tool plugin, replace the original reference image with the target image.
[0123] Here, replacing the original reference image can refer to setting the newly generated target image as the content of the original image nodes, thus overwriting or updating the initial image.
[0124] For example, the plugin can use the API of a design tool (such as Figma's setImageFill method) to apply the target image data to the original image node and update the canvas display.
[0125] As an example, the plugin can receive the image URL or Base64 data returned by the AI service, call the FigmaAPI's setImageFill method to replace the original node fill, and the user can immediately see the updated result.
[0126] Optionally, Figure 8 This is a schematic diagram of generating an image to fill a canvas, as provided in an embodiment of this disclosure. Figure 8 As shown, the newly generated image can be added to the fill of the original Figma node, and thus the original image is filled, completing the perspective adjustment.
[0127] In this embodiment, by seamlessly replacing images, users can complete the entire process of adjusting the perspective without manual operation, improving work efficiency and ensuring that the composition effect is visualized in real time.
[0128] In one exemplary embodiment, updating the rendering perspective of a 3D model includes: In response to the selection command of the preset viewpoint, the rendering viewpoint of the 3D model is updated in the preview window according to the preset viewpoint.
[0129] The preset viewpoint can refer to a predefined common viewpoint, such as "front view," "side view," or "top view." The selection command can refer to the user's operation of selecting a preset option, such as clicking a drop-down menu item.
[0130] For example, the plugin can provide a preset view menu. After the user selects the desired rendering view, the plugin can automatically calculate the corresponding camera parameters and smoothly transition to the target view to update the rendering.
[0131] As an example, the plugin can have a built-in array of view presets, and when the user selects a view, the camera can be smoothly moved to the preset position using the Tween.js animation library or similar methods.
[0132] In this embodiment, by using preset viewpoint options, users can quickly switch to commonly used angles, reducing manual adjustment time, improving composition efficiency, and ensuring viewpoint standardization.
[0133] In one exemplary embodiment, the image generation method provided by this disclosure may specifically include: Image input and node recognition include: users can drag and drop AI-generated images with adjustable perspectives into the Figma canvas, and then select the image node in the plugin panel. The plugin can obtain the currently selected node through the figma.currentPage.selection interface of the Figma API and automatically filter out non-image type nodes (such as text, vector graphics, etc.) to ensure that subsequent functions are enabled only for image nodes that meet the conditions.
[0134] When the user selects the AI-generated character image in the canvas, the plugin recognizes that the node type is RECTANGLE and has the fills image fill attribute, and then the "Generate 3D White Model" button can be activated to prepare for the conversion process.
[0135] Image-to-3D white model conversion includes: After the user clicks the "Generate 3D White Model" button in the plugin, the plugin can send the selected image data to a cloud-based 3D reconstruction service via a RESTful API. The service can use a pre-trained neural network model to analyze the image's depth information, generate the corresponding low-polygon 3D white model, and return it to the plugin in GLB format.
[0136] Interactive view adjustment includes: the plugin's embedded WebGL renderer can continuously listen to the user's mouse events, and use 3D interactive libraries such as OrbitControls to map the 2D mouse displacement to the camera matrix transformation in 3D space, updating the rendered screen every frame to achieve a smooth view adjustment experience.
[0137] Optionally, when the user holds down the left mouse button and drags horizontally, the 3D white model can rotate along the Y-axis; when dragged vertically, it can rotate along the X-axis. At the same time, the preview window displays the effect of the viewpoint change in real time, allowing the user to precisely control the viewing angle.
[0138] Angle reference image generation includes: after the user adjusts to a satisfactory viewpoint and clicks the "Confirm Angle Adjustment" button, the plugin can call the renderer's image export method to export the current canvas content as a high-resolution image in PNG, JPG, or other formats, and save it as an angle reference image.
[0139] AI image regeneration includes: the plugin can encapsulate the original reference image, angle reference image, and user-inputted text prompts into a multimodal request, which calls the specified AI image generation service via API. Simultaneously, the plugin can automatically generate angle description text based on the current viewpoint and merge it with the user prompts to form optimized generation instructions.
[0140] The result replacement and display include: after the plugin receives a new image returned by the AI service, it applies the new image data to the original image node through the setImageFill method of the Figma API, replaces the original fill content, and immediately updates the display on the canvas, completing the entire perspective adjustment process.
[0141] As an example, the plugin can retain the history of the original image during the replacement process, allowing users to revert to the previous state using Figma's undo function, ensuring the safety and reversibility of the operation.
[0142] This disclosure enables precise and intuitive control over the perspective of AI-generated images, reducing the number of repeated generation attempts; it provides a visual interactive adjustment method, lowering the user's operational threshold; it is deeply integrated with design tools, avoiding cross-platform operations and improving work efficiency; it is especially suitable for scenarios that require multi-angle display, such as game characters, product designs, and scene concepts.
[0143] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0144] The image generation apparatus provided in the embodiments of this disclosure is described below. The image generation apparatus has the same inventive concept as the image generation method described above. The solution to the problem provided by the apparatus is similar to the solution described in the method described above. Therefore, the specific limitations of one or more image generation apparatus embodiments provided below can be referred to the limitations of the image generation method above. The image generation apparatus described below and the image generation method described above can be referred to each other, and will not be repeated here.
[0145] In one exemplary embodiment, Figure 9 This is a schematic diagram of the structure of an image generation apparatus provided in an embodiment of the present disclosure, such as... Figure 9 As shown, the image generation device 90 includes: an original image conversion module 910, a viewpoint interaction module 920, an angle confirmation module 930, and a target image generation module 940, wherein: The original image conversion module 910 is used to acquire the original reference image and convert the original reference image into a 3D model.
[0146] The perspective interaction module 920 is used to update the rendering perspective of the 3D model in response to perspective adjustment commands for interacting with the 3D model.
[0147] Angle confirmation module 930 is used to generate a 2D reference image based on the rendering viewpoint in response to an angle confirmation command.
[0148] The target image generation module 940 is used to take the 2D reference image and the original reference image as combined input conditions and input them into the AI image generation model to generate the target image.
[0149] In an exemplary embodiment, the original image conversion module 910 is used to generate a 3D model corresponding to the original reference image in response to a 3D generation instruction, and to display the 3D model in a preview window.
[0150] In an exemplary embodiment, the view interaction module 920 is used to update the rotation angle of the 3D model in response to a view adjustment command for the 3D model displayed in the preview window, so as to update the rendering view.
[0151] In an exemplary embodiment, the viewpoint interaction module 920 is used to respond to a viewpoint adjustment command, calculate the corresponding viewpoint transformation parameters, and update the rotation angle of the virtual camera viewpoint according to the viewpoint transformation parameters, so as to update the rendering viewpoint of the 3D model in the preview window.
[0152] In an exemplary embodiment, the angle confirmation module 930 is used to capture the 3D model rendering screen displayed in the preview window under the current rendering view, and convert the captured screen into a 2D reference image.
[0153] In an exemplary embodiment, the original image conversion module 910 is used to send the original reference image to the 3D model service and receive a low-poly 3D textureless model returned by the 3D model service.
[0154] In an exemplary embodiment, the original image conversion module 910 is used to acquire the image to be processed added to the canvas in response to an image addition instruction executed in the design tool plugin; and to determine the original reference image in response to an image node selected on the canvas.
[0155] In an exemplary embodiment, the original image conversion module 910 is used to determine the type attribute of the selected node by means of a design tool plugin in response to selecting an image node on the canvas; if the type attribute is an image type, the image contained in the node is determined as the original reference image.
[0156] In one exemplary embodiment, the original image conversion module 910 is used to display a loading status prompt indicating the conversion progress in a preview window during the process of converting the original reference image into a 3D model.
[0157] In one exemplary embodiment, the interactive viewpoint adjustment instructions include rotating the 3D model by mouse dragging and / or touch gestures.
[0158] In an exemplary embodiment, the target image generation module 940 is used to receive input text prompts; the original reference image, the 2D reference image, and the text prompts are used together as input conditions.
[0159] In an exemplary embodiment, the target image generation module 940 is used to generate text describing the current rendering viewpoint; to fuse the text describing the current rendering viewpoint with text prompts to form optimized prompts; and to use the original reference image and the optimized prompts together as input conditions.
[0160] In one exemplary embodiment, the target image generation module 940 is used to replace the original reference image with a target image in a design tool plugin.
[0161] In an exemplary embodiment, the view interaction module 920 is used to update the rendering view of the 3D model in the preview window according to the preset view in response to the selection command of the preset view.
[0162] Each module in the aforementioned image generation device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.
[0163] In one exemplary embodiment, this disclosure also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of any of the image generation methods described above.
[0164] Indicatively, such as Figure 10 As shown, Figure 10 This is a schematic diagram of the internal structure of a computer device 1000 provided in an embodiment of the present disclosure. The computer device 1000 can be provided as a server. (Refer to...) Figure 10 The computer device 1000 includes a processor 1002, which further includes one or more processors, and memory resources represented by memory 1001 for storing instructions executable by the processor 1002, such as a computer program. The computer program stored in memory 1001 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processor 1002 is configured to execute instructions to perform the image generation method of any of the above embodiments. The computer device 1000 may operate on an operating system stored in memory 1001, such as Windows Server™, Mac OS X™, Unix™, Linux™, Free BSD™, or similar.
[0165] The computer device 1000 may also include a power supply component 1003 configured to perform power management of the computer device 1000, a wired or wireless network interface 1004 configured to connect the computer device 1000 to a network, and an input / output (I / O) interface 1005. Wireless operation may be achieved through Wi-Fi, mobile cellular networks, Near Field Communication (NFC), or other technologies. When the computer program is executed by the processor, it implements an image generation method. The display unit 1007 of the computer device is used to form a visually visible image and may be a display screen, a projection device, or a virtual reality imaging device. The display screen may be an LCD screen or an e-ink display screen. The input device 1006 of the computer device may be a touch layer covering the display screen, or buttons, a trackball, or a touchpad located on the computer device casing, or an external keyboard, touchpad, or mouse, etc.
[0166] Those skilled in the art will understand that Figure 10 The structure shown is merely a block diagram of a portion of the structure related to the present disclosure and does not constitute a limitation on the computer device to which the present disclosure is applied. A specific computer device may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0167] The processor of the aforementioned computer device can execute computer-executable instructions to perform the following operations in the aforementioned image generation method: Obtain the original reference image and convert it into a 3D model; In response to commands that adjust the viewpoint of the 3D model, update the rendering viewpoint of the 3D model. In response to the angle confirmation command, a 2D reference image is generated based on the rendering viewpoint; The 2D reference image and the original reference image are used as combined input conditions and fed into the AI image generation model to generate the target image.
[0168] Converting the original reference image into a 3D model includes: In response to the 3D generation command, a 3D model corresponding to the original reference image is generated and displayed in the preview window.
[0169] In response to commands adjusting the viewpoint of a 3D model, update the rendering viewpoint of the 3D model, including: In response to a command to adjust the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint.
[0170] In response to commands adjusting the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint, including: In response to the viewpoint adjustment command, the corresponding viewpoint transformation parameters are calculated, and the rotation angle of the virtual camera viewpoint is updated according to the viewpoint transformation parameters to update the rendering viewpoint of the 3D model in the preview window.
[0171] In response to the angle confirmation command, a 2D reference image is generated based on the rendering viewpoint, including: Capture the 3D model rendering screen displayed in the preview window under the current rendering view, and convert the captured screen into a 2D reference image.
[0172] Generate a 3D model corresponding to the original reference image, including: Send the original reference image to the 3D model service and receive the low-poly 3D untextured model returned by the 3D model service.
[0173] Obtain the original reference image of the viewpoint to be adjusted, including: In response to an image addition command executed in the design tool plugin, retrieve the image to be processed that has been added to the canvas; In response to the selected image node on the canvas, determine the original reference image from which the viewpoint to be adjusted.
[0174] In response to the selected image node on the canvas, determine the original reference image, including: In response to selecting an image node on the canvas, the type attribute of the selected node is determined through the design tool plugin; If the type attribute is image type, then the image contained in the node is determined as the original reference image.
[0175] The method also includes: During the process of converting the original reference image into a 3D model, a loading status indicator showing the conversion progress is displayed in the preview window.
[0176] Interactive view adjustment commands include: Rotate the 3D model by dragging with the mouse and / or by touch gestures.
[0177] Using the 2D reference image and the original reference image as combined input conditions, it also includes: Receive input text prompts; The original reference image, the 2D reference image, and the text prompt are used together as input conditions.
[0178] The original reference image, the 2D reference image, and the text prompts are used together as input conditions, including: Generate text describing the current rendering viewpoint; The text describing the current rendering perspective is merged with text prompts to form optimized prompts; The original reference image and the optimized prompt words are used together as input conditions.
[0179] After generating the target image, the following steps are also included: In the design tool plugin, replace the original reference image with the target image.
[0180] Update the rendering perspective of the 3D model, including: In response to the selection command of the preset viewpoint, the rendering viewpoint of the 3D model is updated in the preview window according to the preset viewpoint.
[0181] In one exemplary embodiment, this disclosure also provides a computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps of any of the image generation methods described above.
[0182] The processor of the aforementioned computer device can execute computer-executable instructions to perform the following operations in the aforementioned image generation method: Obtain the original reference image and convert it into a 3D model; In response to commands that adjust the viewpoint of the 3D model, update the rendering viewpoint of the 3D model. In response to the angle confirmation command, a 2D reference image is generated based on the rendering viewpoint; The 2D reference image and the original reference image are used as combined input conditions and fed into the AI image generation model to generate the target image.
[0183] Converting the original reference image into a 3D model includes: In response to the 3D generation command, a 3D model corresponding to the original reference image is generated and displayed in the preview window.
[0184] In response to commands adjusting the viewpoint of a 3D model, update the rendering viewpoint of the 3D model, including: In response to a command to adjust the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint.
[0185] In response to commands adjusting the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint, including: In response to the viewpoint adjustment command, the corresponding viewpoint transformation parameters are calculated, and the rotation angle of the virtual camera viewpoint is updated according to the viewpoint transformation parameters to update the rendering viewpoint of the 3D model in the preview window.
[0186] Generate a 3D model corresponding to the original reference image, including: Send the original reference image to the 3D model service and receive the low-poly 3D untextured model returned by the 3D model service.
[0187] Obtain the original reference image of the viewpoint to be adjusted, including: In response to an image addition command executed in the design tool plugin, retrieve the image to be processed that has been added to the canvas; In response to the selected image node on the canvas, determine the original reference image from which the viewpoint is to be adjusted.
[0188] In response to the selected image node on the canvas, determine the original reference image, including: In response to selecting an image node on the canvas, the type attribute of the selected node is determined through the design tool plugin; If the type attribute is image type, then the image contained in the node is determined as the original reference image.
[0189] The method also includes: During the process of converting the original reference image into a 3D model, a loading status indicator showing the conversion progress is displayed in the preview window.
[0190] Interactive view adjustment commands include: Rotate the 3D model by dragging with the mouse and / or by touch gestures.
[0191] Using the 2D reference image and the original reference image as combined input conditions, it also includes: Receive input text prompts; The original reference image, the 2D reference image, and the text prompt are used together as input conditions.
[0192] The original reference image, the 2D reference image, and the text prompts are used together as input conditions, including: Generate text describing the current rendering viewpoint; The text describing the current rendering perspective is merged with text prompts to form optimized prompts; The original reference image and the optimized prompt words are used together as input conditions.
[0193] After generating the target image, the following steps are also included: In the design tool plugin, replace the original reference image with the target image.
[0194] Update the rendering perspective of the 3D model, including: In response to the selection command of the preset viewpoint, the rendering viewpoint of the 3D model is updated in the preview window according to the preset viewpoint.
[0195] In one exemplary embodiment, this disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of any of the image generation methods described above.
[0196] The processor of the aforementioned computer device can execute computer-executable instructions to perform the following operations in the aforementioned image generation method: Obtain the original reference image and convert it into a 3D model; In response to commands that adjust the viewpoint of the 3D model, update the rendering viewpoint of the 3D model. In response to the angle confirmation command, a 2D reference image is generated based on the rendering viewpoint; The 2D reference image and the original reference image are used as combined input conditions and fed into the AI image generation model to generate the target image.
[0197] Converting the original reference image into a 3D model includes: In response to the 3D generation command, a 3D model corresponding to the original reference image is generated and displayed in the preview window.
[0198] In response to commands adjusting the viewpoint of a 3D model, update the rendering viewpoint of the 3D model, including: In response to a command to adjust the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint.
[0199] In response to commands adjusting the viewpoint of the 3D model displayed in the preview window, update the rotation angle of the 3D model to update the rendering viewpoint, including: In response to the viewpoint adjustment command, the corresponding viewpoint transformation parameters are calculated, and the rotation angle of the virtual camera viewpoint is updated according to the viewpoint transformation parameters to update the rendering viewpoint of the 3D model in the preview window.
[0200] Generate a 3D model corresponding to the original reference image, including: Send the original reference image to the 3D model service and receive the low-poly 3D untextured model returned by the 3D model service.
[0201] Obtain the original reference image of the viewpoint to be adjusted, including: In response to an image addition command executed in the design tool plugin, retrieve the image to be processed that has been added to the canvas; In response to the selected image node on the canvas, determine the original reference image from which the viewpoint is to be adjusted.
[0202] In response to the selected image node on the canvas, determine the original reference image, including: In response to selecting an image node on the canvas, the type attribute of the selected node is determined through the design tool plugin; If the type attribute is image type, then the image contained in the node is determined as the original reference image.
[0203] The method also includes: During the process of converting the original reference image into a 3D model, a loading status indicator showing the conversion progress is displayed in the preview window.
[0204] Interactive view adjustment commands include: Rotate the 3D model by dragging with the mouse and / or by touch gestures.
[0205] Using the 2D reference image and the original reference image as combined input conditions, it also includes: Receive input text prompts; The original reference image, the 2D reference image, and the text prompt are used together as input conditions.
[0206] The original reference image, the 2D reference image, and the text prompts are used together as input conditions, including: Generate text describing the current rendering viewpoint; The text describing the current rendering perspective is merged with text prompts to form optimized prompts; The original reference image and the optimized prompt words are used together as input conditions.
[0207] After generating the target image, the following steps are also included: In the design tool plugin, replace the original reference image with the target image.
[0208] Update the rendering perspective of the 3D model, including: In response to the selection command of the preset viewpoint, the rendering viewpoint of the 3D model is updated in the preview window according to the preset viewpoint.
[0209] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.
[0210] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. Any references to memory, databases, or other media used in the embodiments provided in this disclosure can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this disclosure may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this disclosure may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, artificial intelligence (AI) processors, etc., and are not limited to these.
[0211] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this disclosure.
[0212] The embodiments described above are merely illustrative of several implementations of this disclosure, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent disclosure. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this disclosure, and these all fall within the protection scope of this disclosure. Therefore, the protection scope of this disclosure should be determined by the appended claims.
Claims
1. An image generation method, characterized in that, The method includes: Obtain the original reference image and convert the original reference image into a 3D model; In response to a viewpoint adjustment command for the 3D model, update the rendering viewpoint of the 3D model; In response to the angle confirmation command, a 2D reference image is generated based on the rendering viewpoint; The 2D reference image and the original reference image are used as combined input conditions and input into the AI image generation model to generate the target image.
2. The method according to claim 1, characterized in that, The step of converting the original reference image into a 3D model includes: In response to the 3D generation command, a 3D model corresponding to the original reference image is generated, and the 3D model is displayed in the preview window.
3. The method according to claim 2, characterized in that, In response to a viewpoint adjustment command for the 3D model, update the rendering viewpoint of the 3D model, including: In response to a viewpoint adjustment command on the 3D model displayed in the preview window, the rotation angle of the 3D model is updated to update the rendering viewpoint.
4. The method according to claim 3, characterized in that, The step of updating the rotation angle of the 3D model in response to a viewpoint adjustment command displayed in the preview window, in order to update the rendering viewpoint, includes: In response to the viewpoint adjustment command, the corresponding viewpoint transformation parameters are calculated, and the rotation angle of the virtual camera viewpoint is updated according to the viewpoint transformation parameters to update the rendering viewpoint of the 3D model in the preview window.
5. The method according to claim 2, characterized in that, The step of generating a 2D reference image based on the rendering viewpoint in response to the angle confirmation command includes: The 3D model rendering screen displayed in the preview window under the current rendering view is captured, and the captured screen is converted into the 2D reference image.
6. The method according to claim 2, characterized in that, The generation of the 3D model corresponding to the original reference image includes: The original reference image is sent to the 3D model service, and a low-poly 3D textureless model returned by the 3D model service is received.
7. The method according to claim 1, characterized in that, The process of obtaining the original reference image of the viewpoint to be adjusted includes: In response to an image addition command executed in the design tool plugin, retrieve the image to be processed that has been added to the canvas; In response to the selected image node on the canvas, the original reference image is determined.
8. The method according to claim 7, characterized in that, The step of determining the original reference image in response to a selected image node on the canvas includes: In response to selecting an image node on the canvas, the type attribute of the selected node is determined by the design tool plugin; If the type attribute is an image type, then the image contained in the node is determined as the original reference image.
9. The method according to claim 2, characterized in that, The method further includes: During the process of converting the original reference image into a 3D model, a loading status indicator showing the conversion progress is displayed in the preview window.
10. The method according to claim 1, characterized in that, The interactive view adjustment commands include: The 3D model can be rotated by dragging with a mouse and / or by touch gestures.
11. The method according to claim 1, characterized in that, The step of using the 2D reference image and the original reference image as combined input conditions also includes: Receive input text prompts; The original reference image, the 2D reference image, and the text prompt are used together as input conditions.
12. The method according to claim 11, characterized in that, The step of using the original reference image, the 2D reference image, and the text prompt as input conditions includes: Generate text describing the current rendering viewpoint; The text describing the current rendering perspective is merged with the text prompt to form an optimized prompt; The original reference image and the optimized prompt words are used together as input conditions.
13. The method according to claim 7, characterized in that, After generating the target image, the process further includes: In the design tool plugin, the original reference image is replaced with the target image.
14. The method according to claim 2, characterized in that, The updating of the rendering perspective of the 3D model includes: In response to the selection command of the preset viewpoint, the rendering viewpoint of the 3D model is updated in the preview window according to the preset viewpoint.
15. An image generation apparatus, characterized in that, The device includes: The original image conversion module is used to acquire the original reference image and convert the original reference image into a 3D model. A perspective interaction module is used to update the rendering perspective of the 3D model in response to a perspective adjustment command for interacting with the 3D model. An angle confirmation module is used to generate a 2D reference image based on the rendering viewpoint in response to an angle confirmation command; The target image generation module is used to input the 2D reference image and the original reference image as combined input conditions into the AI image generation model to generate the target image.
16. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 14.
17. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 14.
18. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 14.