Voice-driven creation of 3D static assets in computer simulations
The system converts natural language input into 3D assets using neural networks, addressing inefficiencies in existing methods and enhancing the creative process for computer game asset creation.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- SONY INTERACTIVE ENTERTAINMENT LLC
- Filing Date
- 2025-04-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for creating computer game assets are inefficient and lack the ability to generate high-quality 3D assets from natural language input, limiting the creative process for content creators.
A system that converts natural language input, either through text or speech, into 3D assets using neural networks, allowing for the generation and modification of 3D assets within computer simulations.
Enables efficient creation of high-quality 3D assets from textual or spoken descriptions, facilitating rapid prototyping and customization by artists, and integrating these assets into computer simulations.
Smart Images

Figure 0007876676000001 
Figure 0007876676000002 
Figure 0007876676000003
Abstract
Description
Technical Field
[0001] This application relates to a technically inventive non - standard solution that necessarily results from computer technology and brings about specific technical improvements.
Background Art
[0002] As understood herein, commonly used computer game assets such as common background objects are used to enhance the visible appeal of computer games.
Summary of the Invention
[0003] This principle enables content creators to describe the assets they desire as natural language input and create 2D or 3D assets from that (voice) input. It also facilitates the creation of initial prototype assets for artists who use them repeatedly.
[0004] Thus, the method includes receiving text from speech conversion etc. and processing the text using at least one neural network to render a two - dimensional (2D) image of a computer simulation asset. This method also includes converting the 2D image into a three - dimensional (3D) asset. The method includes presenting the 3D asset in at least one computer simulation.
[0005] The text can be input from a keyboard or speech and can indicate at least one position, and the 3D asset is consistent with this position. The text / speech can indicate at least a plurality of objects, and the 3D asset is consistent with the plurality of objects. This method may include using an artist computer to modify the 3D asset before presenting the 3D asset. A microphone can be used to input the modification of the 3D asset into the artist computer.
[0006] In another embodiment, the device includes at least one computer memory that is not a transient signal memory, which includes instructions executable by at least one processor for receiving a photograph of a two-dimensional (2D) image. The instructions are executable for converting the 2D image into a 3D asset and for presenting the 3D asset in at least one computer simulation.
[0007] In another embodiment, the apparatus comprises at least one processor and at least one computer output device configured to be controlled by the processor. The processor is programmed with instructions for identifying a two-dimensional (2D) image, converting the 2D image into a 3D asset, and using the 3D asset as an object in a computer simulation.
[0008] Details of this invention, both in terms of its structure and operation, can be best understood by referring to the attached drawings, in which similar reference numerals indicate similar parts. [Brief explanation of the drawing]
[0009] [Figure 1] This is a block diagram of an exemplary system including an embodiment based on this principle. [Figure 2] This example screenshot shows a person being prompted to enter speech to enable text recognition for a computer simulation asset. [Figure 3] This example demonstrates the logic for converting speech to text for 3D assets using an exemplary flowchart format. [Figure 4] This example screenshot shows how to prompt a person to input an image to generate a computer simulation asset. [Figure 5] This illustrates the exemplary logic in an exemplary flowchart format for converting an image into a 3D asset. [Figure 6]An exemplary flowchart format illustrates the logic for converting text from speech to the position and part of a 3D asset. [Figure 7] An exemplary screenshot related to Figure 6 is shown. [Figure 8] An exemplary screenshot related to Figure 6 is shown. [Figure 9] To illustrate the modification of some of the assets, an illustrative screenshot related to Figure 6 is shown. [Figure 10] To modify a portion of the asset, an exemplary flowchart format is used to illustrate the logic. [Figure 11] An exemplary flowchart format illustrates the logic for closed-loop processing between 3D assets and the physics engine. [Figure 12] This document outlines the technologies used for generating 2D to 3D assets. [Figure 13] This document demonstrates techniques for controlled feature transformation. [Figure 14] This demonstrates a reconstruction approach from 2D to 3D. [Figure 15] This document demonstrates a technique for generating 3D assets without using 2D input. [Modes for carrying out the invention]
[0010] This disclosure generally relates to computer ecosystems, including, but not limited to, forms of consumer electronics (CE) device networks such as computer game networks. The systems described herein may include server and client components that can be connected through a network, thereby enabling data exchange between the client and server components. The client components may include one or more computing devices, including game consoles such as Sony PlayStation® or game consoles made by Microsoft®, Nintendo®, or other manufacturers, virtual reality (VR) headsets, augmented reality (AR) headsets, portable televisions (e.g., smart TVs, internet-enabled TVs), portable computers such as laptops and tablet computers, and other mobile devices, including smartphones and additional examples described below. These client devices may operate in a variety of operating environments. For example, some client computers may use, for example, the Linux® operating system, the Microsoft® operating system, or the Unix® operating system, or an operating system made by Apple® or Google®. These operating environments can be used to run one or more browsing programs, such as browsers created by Microsoft®, Google®, or Mozilla®, or other browser programs that can access websites hosted by the Internet servers described below. Furthermore, one or more computer game programs can be run using operating environments that conform to these principles.
[0011] A server and / or gateway may include one or more processors that execute instructions constituting a server that receives and transmits data over a network such as the Internet. Alternatively, the client and server can be connected via a local intranet or a virtual private network. The server or controller may be instantiated by a game console such as Sony PlayStation®, a personal computer, etc.
[0012] Information may be exchanged between the client and the server over a network. For this purpose and for security, the server and / or client may include firewalls, load balancers, temporary storage, and proxies, as well as other network infrastructure for reliability and security. One or more servers may form a device that implements a method of providing network members with a secure community, such as an online social website.
[0013] A processor can be a single-chip processor or a multi-chip processor capable of executing logic through various lines such as address lines, data lines, and control lines, as well as registers and shift registers.
[0014] Components included in one embodiment can be used in any suitable combination in other embodiments. For example, any of the various components described herein and / or shown in the figures can be combined, replaced, or excluded from other embodiments.
[0015] "A system having at least one of A, B, and C" (similarly, "a system having at least one of A, B, or C" and "a system having at least one of A, B, and C") includes systems having A alone, B alone, C alone, A and B together, A and C together, B and C together, and / or A, B, and C together, etc.
[0016] Referring specifically to FIG. 1 here, an exemplary system 10 is shown, which may include one or more of the exemplary devices described above and detailed below in accordance with the present principles. The first exemplary device included in system 10 is, but not limited to, a consumer electronics (CE) device such as an audio video device (AVD) 12, such as an Internet - enabled TV having a TV tuner (similarly, a set - top box for controlling the TV). Instead, AVD 12 can also be a computer - controlled Internet - enabled (“smart”) phone, a tablet computer, a notebook computer, an HMD, a wearable computer - controlled device, a computer - controlled Internet - enabled music player, a computer - controlled Internet - enabled headset, an implantable device for skin, or other implantable computer - controlled Internet - enabled devices, etc. Anyway, it should be understood that AVD 12 is configured to implement the present principles (e.g., communicate with other CE devices to implement the present principles, execute the logic described herein, and perform any of the other functions and / or operations described herein).
[0017] Therefore, to implement such a principle, the AVD12 can be established by some or all of the components shown in Figure 1. For example, the AVD12 may include one or more displays 14, which may be implemented by high-resolution or super-resolution "4K" or higher-resolution flat screens and may be touch-enabled to receive user input signals by touching the displays. The AVD12 may also include one or more speakers 16 for outputting audio in accordance with this principle, and at least one additional input device 18, such as an audio receiver / microphone, for inputting audible commands to the AVD12 and controlling the AVD12. An exemplary AVD12 may also include one or more network interfaces 20 for communicating over at least one network 22, such as the Internet, WAN, LAN, etc., under the control of one or more processors 24. It may also include a graphics processor. Therefore, the interface 20 may be, but is not limited to, a Wi-Fi® transceiver, and a Wi-Fi® transceiver is, but is not limited to, an example of a wireless computer network interface such as a mesh network transceiver. It should be understood that the processor 24 controls the AVD 12 to implement the present principle, including other elements of the AVD 12 described herein, such as controlling the display 14 to present an image thereon and receiving input therefrom. Furthermore, it should be noted that the network interface 20 may be a wired or wireless modem or router, or other suitable interface such as a wireless telephony transceiver or the Wi-Fi® transceiver described above.
[0018] In addition to the above, the AVD12 may also include one or more input ports 26 such as a high-definition multimedia interface (HDMI (registered trademark)) port or a USB port for physically connecting to another CE device, and / or a headphone port for connecting headphones to the AVD12 to present audio to the user via the headphones. For example, the input port 26 can be connected wired or wirelessly to a cable or satellite source 26a of audio-video content. Thus, the source 26a can be a separate or integrated set-top box, or a satellite receiver. Or, the source 26a can be a game console or disk player that includes content. When implemented as a game console, the source 26a can include some or all of the components described below in relation to the CE device 44.
[0019] The AVD12 may further include one or more computer memories 28 such as disk-based storage or solid-state storage, which are not temporary signals, and in some cases, these storages are embodied as stand-alone devices, or as a personal video recording device (PVR) or video disk player either inside or outside the chassis of the AVD for playing back AV programs, or as removable memory media, within the chassis of the AVD. Also, in some embodiments, the AVD12 may include a position receiver or location receiver such as a mobile phone receiver, a GPS receiver, and / or an altimeter 30, which is configured to receive geographical location information from a satellite base station or a mobile phone base station, provide the information to the processor 24, and / or determine the altitude at which the AVD12 is disposed together with the processor 24. The component 30 can also be realized by an inertial measurement unit (IMU) that typically includes a combination of an accelerometer, a gyroscope, and a magnetometer to determine the position and orientation of the AVD12 in three dimensions.
[0020] Continuing the description of AVD12, in some embodiments, AVD12 may include one or more cameras 32, the one or more cameras 32 may be a thermal imaging camera, a digital camera such as a webcam, and / or a camera integrated into AVD12 and controllable by the processor 24 to collect photographs / images and / or videos in accordance with the present principle. Also included in AVD12 may be a Bluetooth® transceiver 34 and other NFC elements 36 for communicating with other devices using Bluetooth® and / or Near Field Communication (NFC) technology, respectively. An exemplary NFC element may be a radio-frequency identification (RFID) element.
[0021] Furthermore, the AVD12 may include one or more auxiliary sensors 38 that provide input to the processor 24 (e.g., motion sensors such as accelerometers, gyroscopes, cyclometers, or magnetic sensors, infrared (IR) sensors, optical sensors, speed sensors and / or cadence sensors, gesture sensors (e.g., sensors for detecting gesture commands)). The AVD12 may also include a wireless television broadcast port 40 for receiving wireless (over-the-air (OTA)) television broadcasts that provide input to the processor 24. In addition to the foregoing, it should be noted that the AVD12 may also include an IR transmitter and / or IR receiver and / or IR transceiver 42, such as an infrared (IR) data association (IRDA) device. A battery (not shown) may be provided to power the AVD12 and may be a kinetic energy harvester that can convert kinetic energy into electrical power to charge the battery and / or power the AVD12. A graphics processing unit (GPU) 44 and a field-programmable gate array 46 may also be included.
[0022] Referring further to Figure 1, in addition to the AVD12, system 10 may include one or more other CE device types. For example, a first CE device 48 may be a computer game machine that can be used to transmit audio and video of a computer game to the AVD12 via commands sent directly to the AVD12 and / or via a server described later, while a second CE device 50 may include components similar to the first CE device 48. In the example shown, the second CE device 50 may be configured as a computer game controller operated by the player, or a head-mounted display (HMD) worn by the player. Only two CE devices are shown in the example shown, and it should be understood that fewer or more devices may be used. The devices herein may implement some or all of the components shown for the AVD12. Any of the components shown in the following figure may incorporate some or all of the components shown for the AVD12.
[0023] Referring here to the at least one server 52 described above, the server 52 includes at least one server processor 54, at least one tangible computer-readable storage medium 56 such as disk-based storage or solid-state storage, and at least one network interface 58 that, under the control of the server processor 54, enables communication with other devices in Figure 1 via the network 22 and, in fact, facilitates communication between the server and client devices in accordance with this principle. Note that the network interface 58 may be, for example, a wired or wireless modem or router, a Wi-Fi transceiver, or other suitable interface such as a wireless telephony transceiver.
[0024] Therefore, in some embodiments, server 52 may be an internet server or an entire server "farm," and may include or perform "cloud" functionality, thereby allowing devices of system 10 to access the "cloud" environment via server 52, for example, in an exemplary embodiment relating to a network gaming application. Alternatively, server 52 may be implemented by one or more game consoles, or other computers in the same room or nearby as the other devices shown in Figure 1.
[0025] The components shown in the following diagram may include some or all of the components shown in Figure 1.
[0026] Figures 2 and 3 illustrate techniques that enable game designers to create and / or modify three-dimensional (3D) assets for computer simulations such as computer games, typically non-character assets, either from scratch or by adapting to assets pre-stored in an asset library.
[0027] As shown in Figure 2, the user interface 200 is presented on a display 202, such as any display described herein, and on 204, it may prompt the designer to speak the name of a desired asset, for example, the name of a chair in the example shown.
[0028] Figure 3 shows that in block 300, the designer's next speech (e.g., "a brown chair with armrests, four legs, a cushioned surface, and a backrest") is received and converted to text in block 302. Block 303 shows that keywords are extracted from the text using a text processing module to extract keywords. In this example, the keyword extraction output may look like this: Object: Chair Color: Brown Legs: 4 legs Surface: Cushion Back: With backrest
[0029] The text may be input into one or more artificial intelligence (AI) engines, such as neural networks, in block 304 to generate a 2D image of the requested asset. The image may be generated from the outset, or it may be selected by accessing a library of assets. The library search may first be performed on images that match the keyword, and only if no match is found, may the AI engine generate an image of the asset using the text in a 2D or 3D generative model based on supervised or unsupervised training in human language.
[0030] Moving from block 304 to block 306, the 2D image is converted to a 3D asset using a 2D-to-3D conversion system that employs other techniques such as layer stacking or the creation of 3D anaglyph stereoscopic images, or false height resolution. A 2D-to-3D reconstruction model may be used. This may include an encoder-decoder neural architecture, where the encoder takes a 2D image as input and generates an encoding, and the 3D decoder generates a 3D object based on the encoding. Thus, a 3D object or asset can be generated using 2D-to-3D reconstruction, generating a 3D object using a generating neural model and then converting it to specifications, or converting an existing 3D model according to desired specifications. Further details are shown in Figures 5 and 12-15.
[0031] The 3D asset may be presented on a display, for example, as shown in Figure 2, and in block 308, artist modifications to the asset may be received using audio or other inputs such as point-and-click device graphic manipulation inputs. These modifications may include changes to the size, shape, color, style (but not all parts of the asset) of specific parts of the asset, or the texture of the asset's surface. In block 310, the final modified 3D asset is generated for use in a computer simulation.
[0032] Figure 4 shows a UI 400 that may be presented on a display 402, such as any display disclosed herein, to prompt the user to input a photograph of the desired asset in 404. The photograph is depicted in 2D format in 406 and can be uploaded for processing in Figure 5 by selecting the upload selector 408.
[0033] Figure 5 shows that in block 500, a 2D image of the asset in the photograph is received. Moving to block 502, the 2D image is converted into a 3D asset. Proceeding to block 504, the 3D asset may be modified by an artist or other user for use in a computer simulation, as described herein. Additional details of 3D asset generation are shown in Figures 12–15, which are described below.
[0034] Figure 6 shows exemplary logic for specifying multiple assets and their desired relative positions in a computer simulation. Starting with block 600, text is received from direct text input or speech-to-text conversion, and this text describes the assets by name and their desired relative positions to each other.
[0035] The process proceeds to block 602, where a description of only a portion of an asset, or one that does not apply to the entire asset, may be received as needed. If the description is received as audio input, it is converted to text in block 604. In block 606, an AI engine such as a Generative Adversarial Network (GAN) may be used to generate a 2D image based on the previously received asset description and location, and the image is converted to a 3D scene in block 608 according to the principles described herein. 3D assets may be generated directly without going through the 2D phase.
[0036] Figure 7 shows the following. UI700 may be presented on a display 702, such as any display described herein. UI700 may include prompts 704 for a person to speak a description of a desired asset scene, which may be presented in text format after speech-to-text conversion in 706. In the example shown, the person specifies a scene in which a couch is located to the left front of a chair formed as a Gaudi-style chair.
[0037] Figure 8 shows an exemplary result of the process in Figure 7. Continuing the example described in Figure 7, a 3D model of a couch 800 is shown to the left front of the 3D asset 802 of a chair, with the back of the chair 804 being Gaudi-style, drawn with frills 806. Labels 808 may also be presented by each image indicating what the image is trying to depict, so that the artist can verify whether the GAN has correctly performed the desired task.
[0038] One method for validating labels is to render the 3D model into a 2D image and use a similarity metric to compare the similarity between the 2D image generated from the text and the 2D image rendered from the 3D model.
[0039] Figure 9 shows a UI900 that may be presented on a display 902, such as any display described herein. The UI900 may include text 904, which may include speech-to-text conversion from an artist's voice input to modify the chair shown in Figure 8, for example, from Gaudi style to Louis XIV style in the example shown. As a result, the frills on the back of the chair shown in Figure 8 are changed to a more decorative and elegant style, resulting in the example given.
[0040] Figure 10 illustrates another principle related to the above disclosure. In block 1000, text indicating a desired modification to an asset is received, for example, text that can be converted from speech. Based on the desired modification, in block 1002, the relevant parts of the asset are appropriately combined to satisfy the requested modification. This can be done by changing the weights of pixels interpolated along the boundary region in the asset identified as related to the desired modification.
[0041] In addition to assets, artists can verbally describe desired background terrain, such as "mud" or "marble chamber," or other types of terrain. Also, as mentioned above, the size of the assets can be specified by the artist. For example, an artist could specify a chair that is 20 feet tall. This would allow the highest point of an asset, when incorporated into the game space of the simulation, to deform to automatically appear as a space for the chair if it interferes with other assets, such as the highest point of an object. This may require a collaborative approach between humans and AI. More qualitative requirements, such as a wide seat or a tall chair, can be met using an AI-only approach.
[0042] Figure 11 shows an additional embodiment. Once a 3D asset is created as described herein, in block 1100, it can be input into a physics engine. Moving on to block 1102, the geometry of the asset may be modified, for example, by a GAN, to maintain a constant inertia tensor calculated by the physics engine so that the asset tends to move or deform. Thus, the inertia tensor can be solved by the physics engine to describe the behavior of the asset in response to forces. For example, the physics engine can determine whether the generated 3D asset will topple when pushed with a particular force, based on its current structural characteristics.
[0043] In other words, the AI engine can examine the physical properties of the asset's structure, predict how the structure will react to physics, and determine how to maintain the physical proportions of the previous object. Constraints may be imposed; for example, if the asset is furniture, regardless of the weight value the 3D asset may be emulated with, it must be generated with attributes that prevent the furniture from tipping over. This can be achieved, for example, by appropriately changing the dimensions and weight of parts of the asset, or by maintaining the total torque of various parts of the asset at zero. In other words, a rule-based approach can be combined with AI to generate the object itself. In block 1104, the updated asset (or its physical determination) is fed back to the AI engine.
[0044] In addition to visual properties, the acoustic and material properties of an asset can be modified using separate AI engines, such as GANs, using the techniques described herein. For example, a GAN may be used to determine the properties of an asset regarding how it absorbs forces. For instance, it could determine whether the asset shatters or breaks, or absorbs the bullet, upon impact. An asset representing a grenade may be designed to produce different types of explosions in the presence of different assets.
[0045] Referring to Figure 12, an overview of the technology for generating graphic assets from 2D to 3D is shown. The technology in Figure 12 is useful for new assets or when it is impossible to convert existing 3D models. This technology supports both generation and conversion.
[0046] Starting from block 1200, to implement the example described above, a representation 1202 of a real 2D object such as a chair, such as a photograph, is input to a conditionally generative neural model for 2D synthesis. The resulting output 1204 is a 2D representation of the synthesized chair. Output 1204 is sent to an optional 2D transformation model 1206 for interpolation and feature editing. Model 1206 can be entirely AI-based, or it can be interactive between an AI model and a human operator.
[0047] The 2D conversion model 1206 outputs a converted composite representation 1208 of a chair in 2D, as shown in the example. Representation 1208 can be included in the asset library, used as artist input, and used for 3D reconstruction.
[0048] In practice, a 2D-converted composite representation 1208 of a chair or similar object, and / or a 2D representation 1202 of a real asset, can be input to a neural model 1210. The neural model 1210 converts the 2D representation to a 3D shape and outputs a reconstructed mesh 1212 of the asset. The neural model 1210 appropriately includes implicit functions and mesh deformations. If necessary, the reconstructed mesh 1212 can be input to a texture conversion model 1214 for neural rendering of the 3D asset's texture.
[0049] Figure 13 illustrates the control of feature transformation. Starting with block 1300, a 2D generative model (such as a generative adversarial network (GAN)) is trained to generate assets for each asset class, such as tables and chairs. Training can be supervised, semi-supervised, or unsupervised.
[0050] When an asset is requested, an appropriate trained model is selected for the asset specified herein. For example, if separate models exist for generating chairs, tables, etc., the model is selected based on the specified asset.
[0051] The artist typically specifies the characteristics of the asset to be transformed, such as texture, color, and shape (geometry). In block 1302, to transform the generated asset to meet the specifications in the input description, the generation is adjusted based on keywords (e.g., attributes) extracted from the description, which can be considered annotated features (y-labels). In one example, five features of a chair could be used: armrests, legs, back, surface, and landscape (e.g., front or back).
[0052] Moving on to block 1304, encodings can be generated for an annotated chair using different weights, the weights of which can be interpolated to best fit the artist's specifications. The encodings are sent to train a supervised classifier 1306 to discover feature axis F(i). In block 1308, the features can be edited along with the feature axis for a new chair, thereby interactively controlling unique features and transforming attributes (human-AI collaboration), for example, changing an existing chair asset to a chair with a backrest. Thus, the encoding W' for the new chair is the existing chair encoding W plus the product of alpha and feature axis F(i), where alpha can be determined or discovered empirically.
[0053] Figure 14 illustrates a further approach. A 2D representation of a real or synthetic chair 1400 is sent to a 2D encoder-decoder neural model 1402 for shape encoding. The 2D encoder model 1402 may be a convolutional network or a similar deep neural network. The input 1400 to the encoder model 1402 may be the generated and (optionally) transformed image in Figure 13, which satisfies the description of the desired asset. A texture encoder 1404 may also be provided to encode the texture of the object, if necessary.
[0054] The 3D decoder 1406 takes the input encoding and generates a 3D object. The 3D decoder 1406 can also be a convolutional network or a similar DNN. The output of the 3D decoder is a reconstructed mesh 1408 representing the 3D asset.
[0055] To train the network, 3D output can be rendered as a 2D image and compared to the input image. Training can be repeated until the input and output closely match. Alternatively, mesh deformation can be used.
[0056] Encoder-decoder models can be adapted to incorporate additional encodings (e.g., texture encoding) that transform 3D objects to meet the specifications in the description.
[0057] Referring to Figure 15 for an alternative approach to generating 3D assets, in block 1500, a 3D GAN model is trained to generate 3D objects. In block 1502, partial encodings for each part of the asset, such as the armrests, legs, and back of a chair, are extracted. Moving on to block 1504, the partial encodings are transformed based on the shape description 1506 of the desired asset. Moving on to block 1508, the generation of the 3D asset is adjusted based on an appearance description 1510, such as style, size, or non-shape description such as color. The reconstructed mesh 1512 of the 3D asset is output with or without texturing, as needed. That is, the 3D asset model can be rendered based on a specified texture. 3D variations can be generated based on specified attributes.
[0058] While the principles have been described with reference to several exemplary embodiments, it should be recognized that these are not intended to be limiting, and that the subject matter claimed herein can be carried out using a variety of alternative arrangements.
Claims
1. Not a temporary signal, Receiving a two-dimensional (2D) image, Converting the aforementioned 2D image into a 3D asset, Modifying the 3D asset using a computer by at least partially changing the weights of pixels interpolated along at least one boundary region in the 3D asset where the modification is identified, The modified 3D asset is presented in at least one computer simulation, For this purpose, at least one computer memory containing instructions that can be executed by at least one processor A device equipped with the following features.
2. The device according to claim 1, wherein the instruction is executable to associate audio with the 3D asset, at least partially based on text.
3. The device according to claim 1, wherein the instruction is executable to receive speech indicating at least one location, and the 3D asset is consistent with the location.
4. The device according to claim 1, wherein the instruction is executable to receive speech indicating at least a number of objects, and the 3D asset is consistent with the number of objects.
5. The device according to claim 1, wherein the instruction is executable to present on a display a user interface (UI) having a selector for uploading the photograph.
6. The device according to claim 1, wherein the command is executable to present on a display a user interface (UI) having prompts for uttering a desired asset scene.
7. The aforementioned instruction is, The device according to claim 1, which is executable for identifying the 2D image based at least in part on the input of a photograph of the 2D image.
8. The aforementioned instruction is, The device according to claim 1, which is executable for modifying the 3D asset based at least in part on a physical modeling of the environmental impact on the 3D asset.