system
The system addresses the inefficiencies in movie set production by generating three-dimensional designs from scenario information, enhancing creative freedom and reducing costs through real-time visualization and feedback.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-13
- Publication Date
- 2026-06-25
AI Technical Summary
The production of a movie set is time-consuming and costly, with limited creative freedom due to resource concentration on set design and construction, and difficulties in optimizing design before physical construction, leading to additional costs and insufficient expression.
A system that analyzes scenario information to generate three-dimensional film set designs using natural language processing, allowing for real-time visualization and feedback to improve design accuracy, and manages multiple scenes simultaneously.
This system reduces production costs and increases creative freedom by optimizing set design before construction, enabling efficient and accurate set management through user feedback and virtual reality technology.
Smart Images

Figure 2026104559000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] The production of a movie set takes a great deal of time and cost. Especially in large-scale productions, resources tend to be concentrated on set design and construction. Also, there are problems that it is difficult to optimize the design before the construction of a physical set, and the creative freedom is restricted. Additional costs are incurred for set modifications, and in many cases, sufficient expression cannot be achieved due to budget and space constraints. Therefore, means for improving the efficiency and creativity of set design are required.
Means for Solving the Problems
[0005] This invention provides a system that analyzes scenario information input by a user and generates a three-dimensional film set design based on that information, in order to improve the efficiency of set design and construction in film production. The generated design is visualized in a virtual space, and the AI agent can improve its accuracy by providing feedback to the user. Furthermore, this system can manage and optimize multiple scenes simultaneously, and extracts important keywords and themes from the scenario information using natural language processing technology. In this way, the design can be optimized before physical set construction, reducing the cost of film production while increasing creative freedom.
[0006] A "user" is an individual or group that operates the system and inputs script information for the purpose of film production.
[0007] "Scenario information" refers to data that forms the basis of set design, including the film's storyline and visual concept.
[0008] "Means of analysis" refers to the process of analyzing the scenario information received by the system and extracting the elements that form the basis of the design.
[0009] "Methods for generating three-dimensional film set designs" refers to the process of designing virtual film sets in three dimensions based on scenario information.
[0010] A "virtual space" is a computer-generated environment in which users can visually view a three-dimensional movie set.
[0011] "Feedback" refers to the opinions and requests for revisions that users provide regarding the generated set designs, and this information is used to improve the accuracy of the AI.
[0012] "Natural language processing technology" is a field of technology that enables systems to understand input text and structure the information.
[0013] "Key keywords and themes" are core words or concepts extracted from scenario information that are necessary for design generation. [Brief explanation of the drawing]
[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.
Embodiments for Carrying Out the Invention
[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] This invention is a system that supports the efficiency of set design in the film production process, and operates in cooperation with three parties: the user, the terminal, and the server. A specific embodiment of this system is shown below.
[0036] Users input information into their devices that will serve as the film's script and visual concept. This information describes the content, atmosphere, and physical characteristics of the film scenes, and represents the basic ideas and policies for the film's production.
[0037] The terminal processes the input scenario information as digital data and sends it to the server. This includes a function to standardize the scenario information and prepare it in an analyzable format.
[0038] The server uses advanced natural language processing techniques to analyze the scenario information received from the terminal. This analysis process extracts important keywords and themes from the scenario, clarifying the elements necessary for designing the film set.
[0039] The server then uses AI agents to automatically generate three-dimensional movie set designs based on the extracted keywords and themes. These designs are rendered in a virtual space and constructed using virtual reality (VR) technology.
[0040] The generated set is streamed to the user via a device, allowing the user to visually inspect the set in real time using VR equipment. The user can evaluate the set design and provide feedback through the device regarding desired changes and improvements.
[0041] This feedback is sent to the server, which uses it to train the AI. The AI agent incorporates user feedback to improve the accuracy and applicability of subsequent design generation. The server also manages multiple scenes simultaneously and optimizes the set design as needed.
[0042] For example, if a user sets a scene of a medieval village, the server automatically extracts relevant themes such as wooden buildings, cobblestone streets, and crops, and generates a realistic and creative village set based on these. The user can then review the scene and make adjustments, such as the placement of buildings or decorations, and this feedback contributes to the evolution of the AI. In film projects involving multiple scenes at once, users can properly manage these set designs and achieve overall time and cost reductions.
[0043] The following describes the processing flow.
[0044] Step 1:
[0045] Users use a terminal to input movie scripts and visual concepts. The input information includes details of the story, scene settings, and the overall visual atmosphere.
[0046] Step 2:
[0047] The terminal formats the scenario information from the user as digital data and prepares it for analysis. This data is then ready to be sent to the server.
[0048] Step 3:
[0049] The server receives scenario data from the terminals for analysis and uses natural language processing techniques to extract important keywords and themes from the information. This clarifies the data necessary for designing the film set.
[0050] Step 4:
[0051] The server uses extracted keywords and themes to automatically generate 3D movie set designs, utilizing an AI agent. The AI agent then uses existing design data and learning results to create creative set designs.
[0052] Step 5:
[0053] The generated 3D set design is rendered in a virtual space. The server prepares this design as VR content and gets ready to send it to the device.
[0054] Step 6:
[0055] The terminal provides VR content received from the server to the user. The user uses VR equipment to view and interact with a three-dimensional visual set in real time.
[0056] Step 7:
[0057] Users provide feedback based on the visualized set. For example, they can send fine-tuning suggestions, such as the location of buildings or the color scheme of decorations, to the server via their device.
[0058] Step 8:
[0059] The server receives feedback from users and provides that data to the AI agent to advance the learning process. This improves the accuracy of set design in subsequent generation.
[0060] Step 9:
[0061] The server manages multiple scenes simultaneously as needed, optimizing and coordinating sets to improve overall project consistency and efficiency.
[0062] (Example 1)
[0063] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0064] In film production, set design is a time-consuming and resource-intensive process. It's difficult to efficiently manage multiple scenes while simultaneously creating realistic and creative designs in a short timeframe. Furthermore, the process of effectively incorporating user feedback to improve the accuracy of future designs is complex. Therefore, there is a need to easily generate set designs from narrative information, visualize them in a virtual space, and provide users with an intuitive and interactive experience.
[0065] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0066] In this invention, the server includes means for receiving and analyzing narrative information input from a user, means for generating a three-dimensional visual model based on the analyzed information, and means for coordinating with one or more virtual reality devices to deliver the generated visual model in real time. This streamlines the creation and management of set designs and enables improved accuracy in design generation based on real-time visualization and feedback.
[0067] "Users" refers to individuals or groups who use the system to input narrative information, review the generated visual models, and provide feedback.
[0068] "Narrative information" refers to information used to describe the scenarios and concepts related to scenes in movies and visual content.
[0069] "Analysis" refers to the process of identifying important elements from the input narrative information and extracting specific design elements.
[0070] A "three-dimensional visual model" refers to a three-dimensional digital design generated based on narrative information, and a structure that can be displayed in a virtual space.
[0071] A "virtual space" refers to an artificial space created using digital technology that users can experience visually.
[0072] "Virtual reality devices" refer to devices (e.g., headsets and displays) that users use to visually and experientially perceive a virtual space.
[0073] "Real-time delivery" refers to providing the generated visual model to the user immediately and without delay.
[0074] "Feedback" refers to opinions and suggestions for improvement provided by users after experiencing a visual model, and includes information used to improve the generation process.
[0075] One embodiment of this invention is a system that supports the efficiency of set design in the film production process. In this system, the user, terminal, and server work in cooperation with each other.
[0076] Users input story information, such as scenarios and visual concepts, into their devices. This information concretely represents the structure and atmosphere of the film scenes and is a crucial element in realizing the basic ideas of the story.
[0077] The terminal processes the narrative information entered by the user as digital data. This process includes the function of converting the information into a parseable format. Specifically, the information is converted into a standard format such as JSON and sent to the server.
[0078] The server receives data sent from the terminal and analyzes it using generative AI models and natural language processing software. Specifically, NLP libraries and machine learning frameworks are utilized. Through this analysis, important keywords and themes are extracted, clarifying the elements necessary for film set design.
[0079] The server generates a three-dimensional visual model based on the extracted information. A game engine (e.g., Unity or Unreal Engine) is used to render the design in the virtual space and generate VR content. At this stage, the server works in conjunction with virtual reality equipment to deliver the generated model to the user in real time via the device.
[0080] For example, if a user sets a scene for a "medieval village," the server extracts themes such as wooden architecture and cobblestone streets, and generates a realistic and creative village set based on these. The user then uses VR equipment to view the set and provides feedback through prompts such as, "Generate a set design for a fantastical movie scene set in a medieval European village."
[0081] This feedback is sent to the server and used as training data for the generating AI model. This improves the accuracy and effectiveness of subsequent design generation and optimizes the set design.
[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0083] Step 1:
[0084] The user inputs the film's script and visual concepts into the terminal. This input includes film scene settings, characters, and key narrative themes. During this process, the input is saved as text data in a system-recognizable format. Specifically, the user can record information using a keyboard or voice input function.
[0085] Step 2:
[0086] The terminal receives input data from the user and formats it as digital data. In this step, a text analysis tool is used to segment the data and convert it to a standard format (e.g., JSON). Data processing involves breaking down the scenario into its constituent elements and shaping it into a form that is easy to analyze. The output is a parseable data structure that is ready to be sent to the server. Specific actions include launching text analysis software and formatting the data.
[0087] Step 3:
[0088] The server receives data sent from the terminal and performs analysis using advanced natural language processing capabilities. Here, a generative AI model and natural language processing libraries are used to extract important keywords and themes from the data. The input is formatted scenario data, and the output is the extracted keywords. Specific operations include model initialization and the application of the analysis algorithm.
[0089] Step 4:
[0090] The server generates a three-dimensional visual model using an AI agent based on extracted keywords. Here, a game engine (e.g., Unity or Unreal Engine) is used to render the design in a virtual space and generate VR content. As part of the data processing, keywords are converted into relevant visual elements and incorporated into the three-dimensional model. The output is the completed visual model. Specific operations include launching 3D modeling tools and rendering the model.
[0091] Step 5:
[0092] The terminal receives a visual model generated from the server and streams it to the user. This step involves real-time data delivery via an interface with the VR device. The input is the visual model sent from the server, and the output is the VR content displayed in real time. Specific operations include implementing streaming technology and synchronizing with the VR device.
[0093] Step 6:
[0094] Users view visual models via VR equipment and provide feedback. This feedback includes design evaluations, areas for improvement, and further requests. Input involves receiving feedback from users verbally or in writing. Output involves saving the feedback information to the device and preparing it for transmission to the server. Specific operations include using a feedback input interface.
[0095] Step 7:
[0096] The server receives user feedback and uses it as training data for the generated AI model. Using machine learning algorithms, it analyzes the feedback information and implements a process to improve the accuracy of the AI model. Input includes feedback data, and output includes an improved design generation algorithm. Specific actions include retraining the AI model and tuning its parameters.
[0097] (Application Example 1)
[0098] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0099] In modern commercial and retail design, real-time spatial adjustment and display optimization are required, but traditional methods are time-consuming, costly, and difficult to implement efficiently. In particular, intuitive design creation and evaluation in virtual environments, along with rapid feedback loops, are essential.
[0100] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0101] In this invention, the server includes means for receiving and analyzing spatial design information input by a user, means for generating a three-dimensional virtual environment design based on the analyzed information, means for visualizing the generated virtual environment design in a virtual domain and providing it to the user, means for receiving feedback from the user and using it to improve the accuracy of the generation means, and means for extracting important keywords and themes from the spatial design information using natural language processing technology. This enables the efficient design and evaluation of virtual spaces in real time.
[0102] "Spatial design information" refers to information that indicates the design concept, components, and themes of a virtual environment.
[0103] "Means of analysis" refers to the technologies and methods used to process and understand input information.
[0104] "Three-dimensional virtual environment design" refers to the representation of a virtual design in a three-dimensional space, generated on a computer.
[0105] "Means of visualization in a virtual domain" refers to methods for visually displaying generated designs and models using virtual reality technology.
[0106] "Feedback" refers to information collected from users, including opinions and evaluations, that is used to improve the system.
[0107] "Natural language processing technology" is a technology that enables computers to understand and process human language, and is used for various types of analysis and extraction.
[0108] "Key keywords and themes" refer to concepts and ideas that deserve particular attention in spatial design information.
[0109] The system for implementing this invention allows a user to input spatial design information and generate and visualize a virtual environment based on that information. The user inputs information about the spatial design using a smart device. The terminal receives this information, converts it into a standardized data format such as JSON, and sends it to the server.
[0110] The server uses this received data and employs natural language processing techniques to extract important keywords and themes from the information. This is expected to involve the use of natural language processing libraries such as spaCy and BERT. Subsequently, a generative AI model is used to automatically generate a three-dimensional virtual environment design. This generated design is then rendered into the virtual domain using engines such as Unity or Unreal Engine.
[0111] Users can visually confirm the generated virtual environment through a VR-compatible headset. User feedback is sent back to the server via the device and used to improve the accuracy of AI generation. This feedback loop ultimately leads to more efficient and creative virtual space design.
[0112] For example, when a retail store is testing a new store exterior design, they can use a prompt such as, "Generate a proposal for placing the new product display in a location with natural light. Also, create a set design that reflects a simple, Scandinavian-style store." This prompt will quickly generate a virtual environment that meets their requirements. Such a system makes it possible to efficiently consider and quickly implement effective designs.
[0113] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0114] Step 1:
[0115] Users input spatial design information via smart devices. The input information is converted into JSON or XML format and stored on the device as standardized data.
[0116] Step 2:
[0117] The terminal sends standardized data received from the user to the server. Network communication protocols are used in this process to ensure the data reaches the server securely.
[0118] Step 3:
[0119] The server analyzes the received data and uses natural language processing techniques to extract important keywords and themes from the information. In this step, tools such as spaCy and BERT are used for analysis, and the extracted keywords are prepared as input for the generative AI model.
[0120] Step 4:
[0121] The server automatically generates a three-dimensional virtual environment design using an AI model based on the extracted keywords. The generated design data is converted into a format that can be read as a scene file by Unity or Unreal Engine.
[0122] Step 5:
[0123] The virtual environment design generated by the server is sent to the terminal, and the user wears a VR-compatible headset to visualize and review the design within the virtual environment. This allows the user to evaluate the design in real time.
[0124] Step 6:
[0125] Users input feedback into their devices based on the design they have reviewed. This feedback may include specific suggestions, such as which parts of the design they would like to see improved.
[0126] Step 7:
[0127] The device sends user feedback to the server, which uses this feedback to update the AI model's learning and improve the accuracy of future design generation. This completes the feedback loop and drives improvements to the entire system.
[0128] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0129] This invention is a system aimed at providing efficient set design in film production and an interactive environment that takes user emotions into consideration. This system has a configuration in which the user, terminal, and server cooperate and are combined with an emotion engine.
[0130] First, the user inputs movie script information and visual concepts from their device. This includes story scenes and visual images. The input data is formatted by the device and sent to the server.
[0131] The server receives the input scenario information and performs analysis using natural language processing techniques. It extracts important keywords and themes to form an initial design for the movie set. The design generated at this stage is rendered in a virtual space and provided to the user.
[0132] Next, the emotion engine is activated, analyzing the user's facial expressions and voice data to read their emotions. This emotional information is collected in real time along with other data while the user is experiencing the VR content.
[0133] The server reflects the results from the emotion engine and dynamically adjusts the set design and visual elements of the virtual space. This step is performed in real time to reflect changes in the user's emotions. For example, if the user expresses surprise, the lighting in that scene and the placement of specific objects will change.
[0134] Through the terminal, users evaluate the visualized virtual space and provide feedback. This feedback is used to improve the set design and is supplied as training data to the server's AI agent. This feedback mechanism ensures higher accuracy in subsequent generation.
[0135] For example, if a user sets a scene from a horror movie, the server automatically generates dark lighting and an eerie environment. As the user experiences the scene, the emotion engine detects fear or surprise, and parts of the virtual space are modified, with sounds and movements added to increase the sense of unease. In this way, fine-tuning that takes the user's emotions into account makes the set more realistic and dynamic.
[0136] In this way, the present invention enables filmmakers to construct realistic set designs cost-effectively, while also allowing for creative expression that reflects the emotions of the user.
[0137] The following describes the processing flow.
[0138] Step 1:
[0139] Users use their devices to input movie scripts and visual concepts. This includes describing detailed storylines and the atmosphere of each scene.
[0140] Step 2:
[0141] The terminal formats the scenario information entered by the user and prepares the data for transmission to the server.
[0142] Step 3:
[0143] The server receives scenario information sent from the terminal and analyzes the information using natural language processing. This analysis extracts important keywords and themes for each scene.
[0144] Step 4:
[0145] The server uses the extracted data for an AI agent to automatically generate a 3D movie set design. The generated design is then converted into a virtual space.
[0146] Step 5:
[0147] The server renders the generated movie set design and sends it to the device as VR content.
[0148] Step 6:
[0149] The terminal provides the received VR content to the user, and the user experiences the set using VR equipment.
[0150] Step 7:
[0151] During the user experience, the emotion engine analyzes the user's facial expressions and voice, collecting emotion data in real time.
[0152] Step 8:
[0153] The server receives the results of the emotion engine's analysis and dynamically adjusts the set design and visual elements of the virtual space according to the user's emotions. For example, if the emotion of surprise is detected, the surrounding lighting is changed.
[0154] Step 9:
[0155] Users provide feedback through their devices based on their experience, sending any points they want corrected or elements they want to add to the server.
[0156] Step 10:
[0157] The server receives feedback and uses it as data for the AI agent to learn from. The feedback is then incorporated into the next design generation.
[0158] Step 11:
[0159] The server manages multiple scenes simultaneously as needed and executes optimization processes, making adjustments according to the user's project requirements while maintaining overall consistency.
[0160] (Example 2)
[0161] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0162] In film production, there is a need for real-time adjustments to set design based on user emotions, but this is difficult to achieve efficiently with current technology. Furthermore, there is a lack of processes to incorporate user feedback into subsequent designs. As a result, production times are prolonged and costs are increased.
[0163] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0164] In this invention, the server includes means for receiving and analyzing linguistic or visual information input by a user, means for generating an initial three-dimensional environment design based on the analyzed information, and means for detecting and analyzing the user's emotional state and dynamically adjusting the visual elements within the virtual domain. This makes it possible to automatically and in real time adjust the design of a movie set according to the user's emotions, thereby efficiently reducing costs.
[0165] A "user" is an entity that uses the system to design and experience movie sets.
[0166] "Inputted linguistic or visual information" refers to text data and visual concept-based data of scenarios provided by the user.
[0167] "Means of analysis" refer to the processes and techniques used to understand input data and extract important information.
[0168] "Three-dimensional environmental design" refers to the design of three-dimensional sets and structures created for films and visual content.
[0169] A "virtual domain" refers to a real-time, accessible digital space created within a computer system.
[0170] "Visualization" refers to the process of visually representing digital data so that users can actually see and understand its contents.
[0171] "Emotional state" refers to the results of an analysis of the user's psychological or emotional responses.
[0172] "Dynamic adjustment mechanisms" refer to features that change the environment or elements in real time according to the user's emotions or other circumstances.
[0173] "Rating" refers to the evaluation or opinion provided by a user about their experience using the system.
[0174] "Feedback" refers to the ratings provided by users, which are used to improve and adjust the system.
[0175] This invention provides a system that offers an interactive, emotionally responsive virtual space based on scenarios and designs, for both filmmakers and audiences. The system consists of three components: a user, a terminal, and a server, and each component works in cooperation with the others.
[0176] Users use their devices to input movie script information and visual concepts. This information includes the story flow and visual images. For example, users can use a keyboard to write in key scenes from the story or use a pen tablet to draw concept art.
[0177] The terminal receives information entered by the user and converts it into a standard data format (e.g., JSON or PNG). During this process, the terminal utilizes its capabilities as a modern interactive device to prepare the data for transmission to the server.
[0178] The server receives data formatted by the terminal and analyzes the scenario information using a generative AI model. Specifically, it uses natural language processing techniques to extract important concepts and themes and generates an initial three-dimensional environment design based on them. Information processing is performed using libraries such as Python's SpaCy. This generated design is visualized in a virtual domain in real time and provided to the user.
[0179] Furthermore, the server utilizes an emotion engine to analyze the user's emotional state. This engine processes data acquired using the device's camera and microphone, and uses facial recognition software and voice analysis technology to read the user's emotions. For example, OpenCV supports facial analysis and performs dynamic adjustments to the environment based on emotions.
[0180] For example, if a user sets a scene from a horror movie, the server automatically generates a dark and eerie environment. While the user is experiencing VR, if the emotion engine detects fear or surprise, the server reflects that emotional information and changes the visual elements in the virtual space in real time. For example, it might emphasize the sound of drafts or shadows to increase the sense of unease.
[0181] In this way, this system makes it possible to create more realistic and emotionally resonant set designs while keeping costs down during the film production process. An example of a prompt to be input into the generating AI model would be, "Please set up a forest scene for a horror movie. Create an environment that is dimly lit, foggy, and has the sound of a wolf howling in the distance in the background."
[0182] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0183] Step 1:
[0184] Users input movie script information and visual concepts into the device. Specifically, they can input text data using a keyboard or create visual data using a pen tablet. The input at this time is raw data related to the script and images.
[0185] Step 2:
[0186] The terminal converts the received input data into a standard data format. Text data is converted to JSON format, and visual data is converted to common image formats such as PNG or JPEG. This formatted data is then ready to be sent to the server. The output is formatted data ready to be sent to the server.
[0187] Step 3:
[0188] The server receives formatted data sent from the terminal. Using natural language processing techniques, it extracts important concepts and themes from the text. The data is then analyzed using Python's SpaCy or other natural language processing libraries. At this point, the output consists of the extracted important keywords and themes.
[0189] Step 4:
[0190] The server uses a generative AI model to generate an initial three-dimensional environment design based on extracted keywords and themes. This process provides the AI model with pre-configured prompts and places the initial design in a virtual space. This design is visualized to the user via a VR headset or display. The output is a visually displayable virtual environment design.
[0191] Step 5:
[0192] The device collects user emotion data in real time. It records the user's emotional state from their facial expressions and voice using a camera and microphone. The emotion engine analyzes this input data and formats the detected emotions into data for transmission to the server. The output is summarized emotion information.
[0193] Step 6:
[0194] The server receives emotional information sent from the emotion engine and dynamically adjusts elements within the virtual space. For example, if the user expresses surprise, it might change the lighting or add specific sound effects. These adjustments enable the virtual space to provide a more real-time and interactive experience. The output is the adjusted virtual environment.
[0195] Step 7:
[0196] After experiencing the virtual environment, users provide feedback through their terminal. This feedback is organized and formatted on the terminal and sent to the server. The server analyzes this feedback and saves it as training data to be used for future design generation. The output is the feedback data for analysis.
[0197] (Application Example 2)
[0198] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0199] Traditional virtual stores have struggled to dynamically adjust the environment based on user emotions, limiting their ability to enhance immersion and user experience. In particular, there is a growing need to provide an optimal visual and auditory environment that responds to user emotions.
[0200] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0201] In this invention, the server includes means for receiving and analyzing information input from the user, means for generating a three-dimensional virtual environment based on the analyzed information, means for analyzing the user's facial expressions and voice data to detect emotions, and means for dynamically adjusting the visual and auditory elements of the virtual environment based on the detected emotion information. This makes it possible to provide an optimal store environment that is in line with the user's emotions.
[0202] A "user" is an individual or group that uses this system to input information or experience the virtual environment.
[0203] "Receiving" refers to the process of acquiring information and data provided by the user.
[0204] "Analysis" refers to the process of processing received information to understand its meaning and theme.
[0205] A "three-dimensional virtual environment" refers to a three-dimensional digital space created using computer technology.
[0206] "Visualization" is the process of visually displaying digital data, enabling users to observe or experience it.
[0207] "Facial expression data" refers to information that captures the user's facial movements and reactions.
[0208] "Voice data" refers to data that captures the user's voice or sounds and uses them as information.
[0209] "Detecting emotions" involves analyzing facial expressions and voice data to identify the user's emotional state.
[0210] "Visual elements" refer to elements such as appearance, color, and layout within a virtual environment.
[0211] "Auditory elements" refer to elements such as sounds, music, and sound effects within a virtual environment.
[0212] "Dynamic adjustment" refers to the process of instantly changing elements of a virtual environment in response to real-time changing conditions.
[0213] To realize this invention, the server receives input information from the user and uses natural language processing technology to analyze it. Based on the analyzed information, the server generates a three-dimensional virtual environment, visualizes it, and provides it to the user. The user's terminal also collects facial expression data using facial recognition technology and audio data using voice analysis technology. This data is processed on a cloud service to detect the user's emotions. The server has the function to dynamically adjust the visual and auditory elements within the virtual environment based on the detected emotion information. This allows the user to enjoy an optimal environment tailored to their individual emotions through their experience in the virtual store. For example, if the user indicates a relaxed emotion, the server will warm the lighting in the virtual store and play soothing music. The software supporting this process includes Python, Unity, Google® Cloud Vision API, and an NLP model.
[0214] As a concrete example, consider the experience of a user visiting a virtual store using smart glasses. When the user focuses on a new product displayed on a table and shows excitement, it is possible to interact with the user by displaying detailed information about that product and related promotions in their field of view, and providing audio guidance. An example of a prompt would be, "How should information be presented in the virtual environment when the user is in an excited state?"
[0215] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0216] Step 1:
[0217] Users input movie script information and visual concepts through their terminals. This input data is formatted and sent to the server. The input consists of text-based script information and image data, which are formatted on the terminal. The output is the formatted data sent to the server.
[0218] Step 2:
[0219] The server performs natural language processing using the received scenario information to extract important keywords and themes. This process uses an NLP model to analyze the text data and generate a keyword list based on the analysis. The input is the scenario information, and the output is the extracted keywords and themes. The analyzed information forms the basis for the initial design of the three-dimensional virtual environment.
[0220] Step 3:
[0221] The server generates a three-dimensional virtual environment based on the extracted information and renders its design within the virtual space. Visual elements are constructed using Unity and presented to the user. The input consists of extracted keywords and themes, and the output is a three-dimensional model within the virtual space.
[0222] Step 4:
[0223] While a user experiences a virtual environment using a smart device, the device acquires the user's facial expression and voice data in real time via its camera and microphone. The input is the collected facial expression and voice data, and the output is sensor data for analysis.
[0224] Step 5:
[0225] The server analyzes the acquired facial and voice data to detect the user's emotional state. It uses the Google Cloud Vision API to analyze changes in facial expressions and voice tone using voice analysis software. The input is data from sensors, and the output is the result of the emotion analysis.
[0226] Step 6:
[0227] The server dynamically adjusts the visual and auditory elements of the virtual environment based on detected emotion information. Lighting, music, and sound effects are modified through Unity to create an environment that matches the user's emotions. The input is the emotion analysis result, and the output is the adjusted virtual space.
[0228] Step 7:
[0229] Users experience a tuned virtual environment and send their feedback to the server via their device. This feedback data is stored on the server and used as training data for the generated AI model. The input is user feedback, and the output is data for model improvement.
[0230] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0231] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0232] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0233] [Second Embodiment]
[0234] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0235] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0236] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0237] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0238] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0239] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0240] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0241] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0242] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0243] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0244] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0245] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0246] This invention is a system that supports the efficiency of set design in the film production process, and operates in cooperation with three parties: the user, the terminal, and the server. A specific embodiment of this system is shown below.
[0247] Users input information into their devices that will serve as the film's script and visual concept. This information describes the content, atmosphere, and physical characteristics of the film scenes, and represents the basic ideas and policies for the film's production.
[0248] The terminal processes the input scenario information as digital data and sends it to the server. This includes a function to standardize the scenario information and prepare it in an analyzable format.
[0249] The server uses advanced natural language processing techniques to analyze the scenario information received from the terminal. This analysis process extracts important keywords and themes from the scenario, clarifying the elements necessary for designing the film set.
[0250] The server then uses AI agents to automatically generate three-dimensional movie set designs based on the extracted keywords and themes. These designs are rendered in a virtual space and constructed using virtual reality (VR) technology.
[0251] The generated set is streamed to the user via a device, allowing the user to visually inspect the set in real time using VR equipment. The user can evaluate the set design and provide feedback through the device regarding desired changes and improvements.
[0252] This feedback is sent to the server, which uses it to train the AI. The AI agent incorporates user feedback to improve the accuracy and applicability of subsequent design generation. The server also manages multiple scenes simultaneously and optimizes the set design as needed.
[0253] For example, if a user sets a scene of a medieval village, the server automatically extracts relevant themes such as wooden buildings, cobblestone streets, and crops, and generates a realistic and creative village set based on these. The user can then review the scene and make adjustments, such as the placement of buildings or decorations, and this feedback contributes to the evolution of the AI. In film projects involving multiple scenes at once, users can properly manage these set designs and achieve overall time and cost reductions.
[0254] The following describes the processing flow.
[0255] Step 1:
[0256] Users use a terminal to input movie scripts and visual concepts. The input information includes details of the story, scene settings, and the overall visual atmosphere.
[0257] Step 2:
[0258] The terminal formats the scenario information from the user as digital data and prepares it for analysis. This data is then ready to be sent to the server.
[0259] Step 3:
[0260] The server receives scenario data from the terminals for analysis and uses natural language processing techniques to extract important keywords and themes from the information. This clarifies the data necessary for designing the film set.
[0261] Step 4:
[0262] The server uses extracted keywords and themes to automatically generate 3D movie set designs, utilizing an AI agent. The AI agent then uses existing design data and learning results to create creative set designs.
[0263] Step 5:
[0264] The generated 3D set design is rendered in a virtual space. The server prepares this design as VR content and gets ready to send it to the device.
[0265] Step 6:
[0266] The terminal provides VR content received from the server to the user. The user uses VR equipment to view and interact with a three-dimensional visual set in real time.
[0267] Step 7:
[0268] Users provide feedback based on the visualized set. For example, they can send fine-tuning suggestions, such as the location of buildings or the color scheme of decorations, to the server via their device.
[0269] Step 8:
[0270] The server receives feedback from users and provides that data to the AI agent to advance the learning process. This improves the accuracy of set design in subsequent generation.
[0271] Step 9:
[0272] The server manages multiple scenes simultaneously as needed, optimizing and coordinating sets to improve overall project consistency and efficiency.
[0273] (Example 1)
[0274] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0275] In film production, set design is a time-consuming and resource-intensive process. It's difficult to efficiently manage multiple scenes while simultaneously creating realistic and creative designs in a short timeframe. Furthermore, the process of effectively incorporating user feedback to improve the accuracy of future designs is complex. Therefore, there is a need to easily generate set designs from narrative information, visualize them in a virtual space, and provide users with an intuitive and interactive experience.
[0276] The specific processing by the specific processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0277] In this invention, the server includes means for receiving and analyzing the story information input by the user, means for generating a three-dimensional visual model based on the analyzed information, and means for cooperating with single or multiple virtual reality devices and delivering the generated visual model to the user in real time. As a result, the creation and management of the set design are made more efficient, and it becomes possible to improve the accuracy of design generation based on real-time visualization and feedback.
[0278] The "user" refers to an individual or group who inputs story information using the system, checks the generated visual model, and provides feedback.
[0279] The "story information" refers to information for describing scenarios and concepts related to scenes of movies and visual contents.
[0280] "Analysis" refers to a process of identifying important elements from the input story information and extracting specific design elements.
[0281] The "three-dimensional visual model" is a three-dimensional digital design generated based on the story information, and refers to a structure that can be displayed in a virtual space.
[0282] The "virtual space" refers to an artificial space generated by digital technology that can be visually experienced by the user.
[0283] The "virtual reality device" refers to a device (e.g., headset or display) used by the user to visually and experientially recognize the virtual space.
[0284] "Delivery in real time" means providing the generated visual model to the user immediately without delay.
[0285] "Feedback" refers to the opinions and improvement points provided by users after experiencing the visual model, and includes information used for improving the generation process.
[0286] As an embodiment of this invention, there is a system that supports the efficiency improvement of set design in the movie production process. In this system, the user, terminal, and server operate in cooperation with each other.
[0287] The user inputs a scenario or visual concept that is story information into the terminal. This information specifically represents the composition and atmosphere of movie scenes and is an important element for embodying the basic ideas of the story.
[0288] The terminal processes the story information input by the user as digital data. This processing includes the function of converting the information into an analyzable format. Specifically, the information is converted into a standard format such as JSON and sent to the server.
[0289] The server receives the data sent from the terminal and performs analysis using a generation AI model and natural language processing software. As specific software, NLP libraries and machine learning frameworks are utilized. Through the analysis, important keywords and themes are extracted, and the elements required for movie set design are clarified.
[0290] Based on the extracted information, the server generates a three-dimensional visual model. Game engines (e.g., Unity or Unreal Engine) are utilized to perform the rendering of the design in the virtual space and the generation of VR content. At this stage, the server cooperates with virtual reality devices and distributes the generated model to the user in real time via the terminal.
[0291] For example, if a user sets a scene for a "medieval village," the server extracts themes such as wooden architecture and cobblestone streets, and generates a realistic and creative village set based on these. The user then uses VR equipment to view the set and provides feedback through prompts such as, "Generate a set design for a fantastical movie scene set in a medieval European village."
[0292] This feedback is sent to the server and used as training data for the generating AI model. This improves the accuracy and effectiveness of subsequent design generation and optimizes the set design.
[0293] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0294] Step 1:
[0295] The user inputs the film's script and visual concepts into the terminal. This input includes film scene settings, characters, and key narrative themes. During this process, the input is saved as text data in a system-recognizable format. Specifically, the user can record information using a keyboard or voice input function.
[0296] Step 2:
[0297] The terminal receives input data from the user and formats it as digital data. In this step, a text analysis tool is used to segment the data and convert it to a standard format (e.g., JSON). Data processing involves breaking down the scenario into its constituent elements and shaping it into a form that is easy to analyze. The output is a parseable data structure that is ready to be sent to the server. Specific actions include launching text analysis software and formatting the data.
[0298] Step 3:
[0299] The server receives data sent from the terminal and performs analysis using advanced natural language processing capabilities. Here, a generative AI model and natural language processing libraries are used to extract important keywords and themes from the data. The input is formatted scenario data, and the output is the extracted keywords. Specific operations include model initialization and the application of the analysis algorithm.
[0300] Step 4:
[0301] The server generates a three-dimensional visual model using an AI agent based on extracted keywords. Here, a game engine (e.g., Unity or Unreal Engine) is used to render the design in a virtual space and generate VR content. As part of the data processing, keywords are converted into relevant visual elements and incorporated into the three-dimensional model. The output is the completed visual model. Specific operations include launching 3D modeling tools and rendering the model.
[0302] Step 5:
[0303] The terminal receives a visual model generated from the server and streams it to the user. This step involves real-time data delivery via an interface with the VR device. The input is the visual model sent from the server, and the output is the VR content displayed in real time. Specific operations include implementing streaming technology and synchronizing with the VR device.
[0304] Step 6:
[0305] The user checks the visual model via a VR device and provides feedback. The feedback content includes evaluations of the design, improvement points, further requests, etc. As input, verbal or written feedback from the user is received. As output, the feedback information is saved in the terminal and is ready to be sent to the server. The specific operations include the use of a feedback input interface.
[0306] Step 7:
[0307] The server receives the feedback from the user and uses it as learning data for the generated AI model. Using a machine learning algorithm, a process is implemented to analyze the feedback information and improve the accuracy of the AI model. The input includes feedback data, and the output includes an improved design generation algorithm. The specific operations include re-training the AI model and adjusting parameters.
[0308] (Application Example 1)
[0309] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".
[0310] In modern commercial designs and store designs, real-time space adjustment and display optimization are required. However, the conventional methods are time-consuming and costly, and it is difficult to implement them efficiently, which is an issue. In particular, intuitive design creation and evaluation in a virtual environment, and a rapid feedback loop are needed.
[0311] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0312] In this invention, the server includes means for receiving and analyzing spatial design information input by a user, means for generating a three-dimensional virtual environment design based on the analyzed information, means for visualizing the generated virtual environment design in a virtual domain and providing it to the user, means for receiving feedback from the user and using it to improve the accuracy of the generation means, and means for extracting important keywords and themes from the spatial design information using natural language processing technology. This enables the efficient design and evaluation of virtual spaces in real time.
[0313] "Spatial design information" refers to information that indicates the design concept, components, and themes of a virtual environment.
[0314] "Means of analysis" refers to the technologies and methods used to process and understand input information.
[0315] "Three-dimensional virtual environment design" refers to the representation of a virtual design in a three-dimensional space, generated on a computer.
[0316] "Means of visualization in a virtual domain" refers to methods for visually displaying generated designs and models using virtual reality technology.
[0317] "Feedback" refers to information collected from users, including opinions and evaluations, that is used to improve the system.
[0318] "Natural language processing technology" is a technology that enables computers to understand and process human language, and is used for various types of analysis and extraction.
[0319] "Key keywords and themes" refer to concepts and ideas that deserve particular attention in spatial design information.
[0320] The system for implementing this invention allows a user to input spatial design information and generate and visualize a virtual environment based on that information. The user inputs information about the spatial design using a smart device. The terminal receives this information, converts it into a standardized data format such as JSON, and sends it to the server.
[0321] The server uses this received data and employs natural language processing techniques to extract important keywords and themes from the information. This is expected to involve the use of natural language processing libraries such as spaCy and BERT. Subsequently, a generative AI model is used to automatically generate a three-dimensional virtual environment design. This generated design is then rendered into the virtual domain using engines such as Unity or Unreal Engine.
[0322] Users can visually confirm the generated virtual environment through a VR-compatible headset. User feedback is sent back to the server via the device and used to improve the accuracy of AI generation. This feedback loop ultimately leads to more efficient and creative virtual space design.
[0323] For example, when a retail store is testing a new store exterior design, they can use a prompt such as, "Generate a proposal for placing the new product display in a location with natural light. Also, create a set design that reflects a simple, Scandinavian-style store." This prompt will quickly generate a virtual environment that meets their requirements. Such a system makes it possible to efficiently consider and quickly implement effective designs.
[0324] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0325] Step 1:
[0326] Users input spatial design information via smart devices. The input information is converted into JSON or XML format and stored on the device as standardized data.
[0327] Step 2:
[0328] The terminal sends standardized data received from the user to the server. Network communication protocols are used in this process to ensure the data reaches the server securely.
[0329] Step 3:
[0330] The server analyzes the received data and uses natural language processing techniques to extract important keywords and themes from the information. In this step, tools such as spaCy and BERT are used for analysis, and the extracted keywords are prepared as input for the generative AI model.
[0331] Step 4:
[0332] The server automatically generates a three-dimensional virtual environment design using an AI model based on the extracted keywords. The generated design data is converted into a format that can be read as a scene file by Unity or Unreal Engine.
[0333] Step 5:
[0334] The virtual environment design generated by the server is sent to the terminal, and the user wears a VR-compatible headset to visualize and review the design within the virtual environment. This allows the user to evaluate the design in real time.
[0335] Step 6:
[0336] Users input feedback into their devices based on the design they have reviewed. This feedback may include specific suggestions, such as which parts of the design they would like to see improved.
[0337] Step 7:
[0338] The device sends user feedback to the server, which uses this feedback to update the AI model's learning and improve the accuracy of future design generation. This completes the feedback loop and drives improvements to the entire system.
[0339] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0340] This invention is a system aimed at providing efficient set design in film production and an interactive environment that takes user emotions into consideration. This system has a configuration in which the user, terminal, and server cooperate and are combined with an emotion engine.
[0341] First, the user inputs movie script information and visual concepts from their device. This includes story scenes and visual images. The input data is formatted by the device and sent to the server.
[0342] The server receives the input scenario information and performs analysis using natural language processing techniques. It extracts important keywords and themes to form an initial design for the movie set. The design generated at this stage is rendered in a virtual space and provided to the user.
[0343] Next, the emotion engine is activated, analyzing the user's facial expressions and voice data to read their emotions. This emotional information is collected in real time along with other data while the user is experiencing the VR content.
[0344] The server reflects the results from the emotion engine and dynamically adjusts the set design and visual elements of the virtual space. This step is performed in real time to reflect changes in the user's emotions. For example, if the user expresses surprise, the lighting in that scene and the placement of specific objects will change.
[0345] Through the terminal, users evaluate the visualized virtual space and provide feedback. This feedback is used to improve the set design and is supplied as training data to the server's AI agent. This feedback mechanism ensures higher accuracy in subsequent generation.
[0346] For example, if a user sets a scene from a horror movie, the server automatically generates dark lighting and an eerie environment. As the user experiences the scene, the emotion engine detects fear or surprise, and parts of the virtual space are modified, with sounds and movements added to increase the sense of unease. In this way, fine-tuning that takes the user's emotions into account makes the set more realistic and dynamic.
[0347] In this way, the present invention enables filmmakers to construct realistic set designs cost-effectively, while also allowing for creative expression that reflects the emotions of the user.
[0348] The following describes the processing flow.
[0349] Step 1:
[0350] Users use their devices to input movie scripts and visual concepts. This includes describing detailed storylines and the atmosphere of each scene.
[0351] Step 2:
[0352] The terminal formats the scenario information entered by the user and prepares the data for transmission to the server.
[0353] Step 3:
[0354] The server receives scenario information sent from the terminal and analyzes the information using natural language processing. This analysis extracts important keywords and themes for each scene.
[0355] Step 4:
[0356] The server uses the extracted data for an AI agent to automatically generate a 3D movie set design. The generated design is then converted into a virtual space.
[0357] Step 5:
[0358] The server renders the generated movie set design and sends it to the device as VR content.
[0359] Step 6:
[0360] The terminal provides the received VR content to the user, and the user experiences the set using VR equipment.
[0361] Step 7:
[0362] During the user experience, the emotion engine analyzes the user's facial expressions and voice, collecting emotion data in real time.
[0363] Step 8:
[0364] The server receives the results of the emotion engine's analysis and dynamically adjusts the set design and visual elements of the virtual space according to the user's emotions. For example, if the emotion of surprise is detected, the surrounding lighting is changed.
[0365] Step 9:
[0366] Users provide feedback through their devices based on their experience, sending any points they want corrected or elements they want to add to the server.
[0367] Step 10:
[0368] The server receives feedback and uses it as data for the AI agent to learn from. The feedback is then incorporated into the next design generation.
[0369] Step 11:
[0370] The server manages multiple scenes simultaneously as needed and executes optimization processes, making adjustments according to the user's project requirements while maintaining overall consistency.
[0371] (Example 2)
[0372] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0373] In film production, there is a need for real-time adjustments to set design based on user emotions, but this is difficult to achieve efficiently with current technology. Furthermore, there is a lack of processes to incorporate user feedback into subsequent designs. As a result, production times are prolonged and costs are increased.
[0374] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0375] In this invention, the server includes means for receiving and analyzing linguistic or visual information input by a user, means for generating an initial three-dimensional environment design based on the analyzed information, and means for detecting and analyzing the user's emotional state and dynamically adjusting the visual elements within the virtual domain. This makes it possible to automatically and in real time adjust the design of a movie set according to the user's emotions, thereby efficiently reducing costs.
[0376] A "user" is an entity that uses the system to design and experience movie sets.
[0377] "Inputted linguistic or visual information" refers to text data and visual concept-based data of scenarios provided by the user.
[0378] "Means of analysis" refer to the processes and techniques used to understand input data and extract important information.
[0379] "Three-dimensional environmental design" refers to the design of three-dimensional sets and structures created for films and visual content.
[0380] A "virtual domain" refers to a real-time, accessible digital space created within a computer system.
[0381] "Visualization" refers to the process of visually representing digital data so that users can actually see and understand its contents.
[0382] "Emotional state" refers to the results of an analysis of the user's psychological or emotional responses.
[0383] "Dynamic adjustment mechanisms" refer to features that change the environment or elements in real time according to the user's emotions or other circumstances.
[0384] "Rating" refers to the evaluation or opinion provided by a user about their experience using the system.
[0385] "Feedback" refers to the ratings provided by users, which are used to improve and adjust the system.
[0386] This invention provides a system that offers an interactive, emotionally responsive virtual space based on scenarios and designs, for both filmmakers and audiences. The system consists of three components: a user, a terminal, and a server, and each component works in cooperation with the others.
[0387] Users use their devices to input movie script information and visual concepts. This information includes the story flow and visual images. For example, users can use a keyboard to write in key scenes from the story or use a pen tablet to draw concept art.
[0388] The terminal receives information entered by the user and converts it into a standard data format (e.g., JSON or PNG). During this process, the terminal utilizes its capabilities as a modern interactive device to prepare the data for transmission to the server.
[0389] The server receives data formatted by the terminal and analyzes the scenario information using a generative AI model. Specifically, it uses natural language processing techniques to extract important concepts and themes and generates an initial three-dimensional environment design based on them. Information processing is performed using libraries such as Python's SpaCy. This generated design is visualized in a virtual domain in real time and provided to the user.
[0390] Furthermore, the server utilizes an emotion engine to analyze the user's emotional state. This engine processes data acquired using the device's camera and microphone, and uses facial recognition software and voice analysis technology to read the user's emotions. For example, OpenCV supports facial analysis and performs dynamic adjustments to the environment based on emotions.
[0391] For example, if a user sets a scene from a horror movie, the server automatically generates a dark and eerie environment. While the user is experiencing VR, if the emotion engine detects fear or surprise, the server reflects that emotional information and changes the visual elements in the virtual space in real time. For example, it might emphasize the sound of drafts or shadows to increase the sense of unease.
[0392] In this way, this system makes it possible to create more realistic and emotionally resonant set designs while keeping costs down during the film production process. An example of a prompt to be input into the generating AI model would be, "Please set up a forest scene for a horror movie. Create an environment that is dimly lit, foggy, and has the sound of a wolf howling in the distance in the background."
[0393] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0394] Step 1:
[0395] Users input movie script information and visual concepts into the device. Specifically, they can input text data using a keyboard or create visual data using a pen tablet. The input at this time is raw data related to the script and images.
[0396] Step 2:
[0397] The terminal converts the received input data into a standard data format. Text data is converted to JSON format, and visual data is converted to common image formats such as PNG or JPEG. This formatted data is then ready to be sent to the server. The output is formatted data ready to be sent to the server.
[0398] Step 3:
[0399] The server receives formatted data sent from the terminal. Using natural language processing techniques, it extracts important concepts and themes from the text. The data is then analyzed using Python's SpaCy or other natural language processing libraries. At this point, the output consists of the extracted important keywords and themes.
[0400] Step 4:
[0401] The server uses a generative AI model to generate an initial three-dimensional environment design based on extracted keywords and themes. This process provides the AI model with pre-configured prompts and places the initial design in a virtual space. This design is visualized to the user via a VR headset or display. The output is a visually displayable virtual environment design.
[0402] Step 5:
[0403] The device collects user emotion data in real time. It records the user's emotional state from their facial expressions and voice using a camera and microphone. The emotion engine analyzes this input data and formats the detected emotions into data for transmission to the server. The output is summarized emotion information.
[0404] Step 6:
[0405] The server receives emotional information sent from the emotion engine and dynamically adjusts elements within the virtual space. For example, if the user expresses surprise, it might change the lighting or add specific sound effects. These adjustments enable the virtual space to provide a more real-time and interactive experience. The output is the adjusted virtual environment.
[0406] Step 7:
[0407] After experiencing the virtual environment, users provide feedback through their terminal. This feedback is organized and formatted on the terminal and sent to the server. The server analyzes this feedback and saves it as training data to be used for future design generation. The output is the feedback data for analysis.
[0408] (Application Example 2)
[0409] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0410] Traditional virtual stores have struggled to dynamically adjust the environment based on user emotions, limiting their ability to enhance immersion and user experience. In particular, there is a growing need to provide an optimal visual and auditory environment that responds to user emotions.
[0411] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0412] In this invention, the server includes means for receiving and analyzing information input from the user, means for generating a three-dimensional virtual environment based on the analyzed information, means for analyzing the user's facial expressions and voice data to detect emotions, and means for dynamically adjusting the visual and auditory elements of the virtual environment based on the detected emotion information. This makes it possible to provide an optimal store environment that is in line with the user's emotions.
[0413] A "user" is an individual or group that uses this system to input information or experience the virtual environment.
[0414] "Receiving" refers to the process of acquiring information and data provided by the user.
[0415] "Analysis" refers to the process of processing received information to understand its meaning and theme.
[0416] A "three-dimensional virtual environment" refers to a three-dimensional digital space created using computer technology.
[0417] "Visualization" is the process of visually displaying digital data, enabling users to observe or experience it.
[0418] "Facial expression data" refers to information that captures the user's facial movements and reactions.
[0419] "Voice data" refers to data that captures the user's voice or sounds and uses them as information.
[0420] "Detecting emotions" involves analyzing facial expressions and voice data to identify the user's emotional state.
[0421] "Visual elements" refer to elements such as appearance, color, and layout within a virtual environment.
[0422] "Auditory elements" refer to elements such as sounds, music, and sound effects within a virtual environment.
[0423] "Dynamic adjustment" refers to the process of instantly changing elements of a virtual environment in response to real-time changing conditions.
[0424] To realize this invention, the server receives input information from the user and uses natural language processing technology to analyze it. Based on the analyzed information, the server generates a three-dimensional virtual environment, visualizes it, and provides it to the user. The user's terminal also collects facial expression data using facial recognition technology and audio data using voice analysis technology. This data is processed on a cloud service to detect the user's emotions. The server has the function to dynamically adjust the visual and auditory elements within the virtual environment based on the detected emotion information. This allows the user to enjoy an optimal environment tailored to their individual emotions through their experience in the virtual store. For example, if the user indicates a relaxed emotion, the server will warm the lighting in the virtual store and play soothing music. The software supporting this process includes Python, Unity, the Google Cloud Vision API, and an NLP model.
[0425] As a concrete example, consider the experience of a user visiting a virtual store using smart glasses. When the user focuses on a new product displayed on a table and shows excitement, it is possible to interact with the user by displaying detailed information about that product and related promotions in their field of view, and providing audio guidance. An example of a prompt would be, "How should information be presented in the virtual environment when the user is in an excited state?"
[0426] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0427] Step 1:
[0428] Users input movie script information and visual concepts through their terminals. This input data is formatted and sent to the server. The input consists of text-based script information and image data, which are formatted on the terminal. The output is the formatted data sent to the server.
[0429] Step 2:
[0430] The server performs natural language processing using the received scenario information to extract important keywords and themes. This process uses an NLP model to analyze the text data and generate a keyword list based on the analysis. The input is the scenario information, and the output is the extracted keywords and themes. The analyzed information forms the basis for the initial design of the three-dimensional virtual environment.
[0431] Step 3:
[0432] The server generates a three-dimensional virtual environment based on the extracted information and renders its design within the virtual space. Visual elements are constructed using Unity and presented to the user. The input consists of extracted keywords and themes, and the output is a three-dimensional model within the virtual space.
[0433] Step 4:
[0434] While a user experiences a virtual environment using a smart device, the device acquires the user's facial expression and voice data in real time via its camera and microphone. The input is the collected facial expression and voice data, and the output is sensor data for analysis.
[0435] Step 5:
[0436] The server analyzes the acquired facial and voice data to detect the user's emotional state. It uses the Google Cloud Vision API to analyze changes in facial expressions and voice tone using voice analysis software. The input is data from sensors, and the output is the result of the emotion analysis.
[0437] Step 6:
[0438] The server dynamically adjusts the visual and auditory elements of the virtual environment based on detected emotion information. Lighting, music, and sound effects are modified through Unity to create an environment that matches the user's emotions. The input is the emotion analysis result, and the output is the adjusted virtual space.
[0439] Step 7:
[0440] Users experience a tuned virtual environment and send their feedback to the server via their device. This feedback data is stored on the server and used as training data for the generated AI model. The input is user feedback, and the output is data for model improvement.
[0441] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0442] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0443] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0444] [Third Embodiment]
[0445] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0446] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0447] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0448] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0449] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0450] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0451] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0452] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0453] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0454] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0455] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0456] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0457] This invention is a system that supports the efficiency of set design in the film production process, and operates in cooperation with three parties: the user, the terminal, and the server. A specific embodiment of this system is shown below.
[0458] Users input information into their devices that will serve as the film's script and visual concept. This information describes the content, atmosphere, and physical characteristics of the film scenes, and represents the basic ideas and policies for the film's production.
[0459] The terminal processes the input scenario information as digital data and sends it to the server. This includes a function to standardize the scenario information and prepare it in an analyzable format.
[0460] The server uses advanced natural language processing techniques to analyze the scenario information received from the terminal. This analysis process extracts important keywords and themes from the scenario, clarifying the elements necessary for designing the film set.
[0461] The server then uses AI agents to automatically generate three-dimensional movie set designs based on the extracted keywords and themes. These designs are rendered in a virtual space and constructed using virtual reality (VR) technology.
[0462] The generated set is streamed to the user via a device, allowing the user to visually inspect the set in real time using VR equipment. The user can evaluate the set design and provide feedback through the device regarding desired changes and improvements.
[0463] This feedback is sent to the server, which uses it to train the AI. The AI agent incorporates user feedback to improve the accuracy and applicability of subsequent design generation. The server also manages multiple scenes simultaneously and optimizes the set design as needed.
[0464] For example, if a user sets a scene of a medieval village, the server automatically extracts relevant themes such as wooden buildings, cobblestone streets, and crops, and generates a realistic and creative village set based on these. The user can then review the scene and make adjustments, such as the placement of buildings or decorations, and this feedback contributes to the evolution of the AI. In film projects involving multiple scenes at once, users can properly manage these set designs and achieve overall time and cost reductions.
[0465] The following describes the processing flow.
[0466] Step 1:
[0467] Users use a terminal to input movie scripts and visual concepts. The input information includes details of the story, scene settings, and the overall visual atmosphere.
[0468] Step 2:
[0469] The terminal formats the scenario information from the user as digital data and prepares it for analysis. This data is then ready to be sent to the server.
[0470] Step 3:
[0471] The server receives scenario data from the terminals for analysis and uses natural language processing techniques to extract important keywords and themes from the information. This clarifies the data necessary for designing the film set.
[0472] Step 4:
[0473] The server uses extracted keywords and themes to automatically generate 3D movie set designs, utilizing an AI agent. The AI agent then uses existing design data and learning results to create creative set designs.
[0474] Step 5:
[0475] The generated 3D set design is rendered in a virtual space. The server prepares this design as VR content and gets ready to send it to the device.
[0476] Step 6:
[0477] The terminal provides VR content received from the server to the user. The user uses VR equipment to view and interact with a three-dimensional visual set in real time.
[0478] Step 7:
[0479] Users provide feedback based on the visualized set. For example, they can send fine-tuning suggestions, such as the location of buildings or the color scheme of decorations, to the server via their device.
[0480] Step 8:
[0481] The server receives feedback from users and provides that data to the AI agent to advance the learning process. This improves the accuracy of set design in subsequent generation.
[0482] Step 9:
[0483] The server manages multiple scenes simultaneously as needed, optimizing and coordinating sets to improve overall project consistency and efficiency.
[0484] (Example 1)
[0485] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0486] In film production, set design is a time-consuming and resource-intensive process. It's difficult to efficiently manage multiple scenes while simultaneously creating realistic and creative designs in a short timeframe. Furthermore, the process of effectively incorporating user feedback to improve the accuracy of future designs is complex. Therefore, there is a need to easily generate set designs from narrative information, visualize them in a virtual space, and provide users with an intuitive and interactive experience.
[0487] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0488] In this invention, the server includes means for receiving and analyzing narrative information input from a user, means for generating a three-dimensional visual model based on the analyzed information, and means for coordinating with one or more virtual reality devices to deliver the generated visual model in real time. This streamlines the creation and management of set designs and enables improved accuracy in design generation based on real-time visualization and feedback.
[0489] "Users" refers to individuals or groups who use the system to input narrative information, review the generated visual models, and provide feedback.
[0490] "Narrative information" refers to information used to describe the scenarios and concepts related to scenes in movies and visual content.
[0491] "Analysis" refers to the process of identifying important elements from the input narrative information and extracting specific design elements.
[0492] A "three-dimensional visual model" refers to a three-dimensional digital design generated based on narrative information, and a structure that can be displayed in a virtual space.
[0493] A "virtual space" refers to an artificial space created using digital technology that users can experience visually.
[0494] "Virtual reality devices" refer to devices (e.g., headsets and displays) that users use to visually and experientially perceive a virtual space.
[0495] "Real-time delivery" refers to providing the generated visual model to the user immediately and without delay.
[0496] "Feedback" refers to opinions and suggestions for improvement provided by users after experiencing a visual model, and includes information used to improve the generation process.
[0497] One embodiment of this invention is a system that supports the efficiency of set design in the film production process. In this system, the user, terminal, and server work in cooperation with each other.
[0498] Users input story information, such as scenarios and visual concepts, into their devices. This information concretely represents the structure and atmosphere of the film scenes and is a crucial element in realizing the basic ideas of the story.
[0499] The terminal processes the narrative information entered by the user as digital data. This process includes the function of converting the information into a parseable format. Specifically, the information is converted into a standard format such as JSON and sent to the server.
[0500] The server receives data sent from the terminal and analyzes it using generative AI models and natural language processing software. Specifically, NLP libraries and machine learning frameworks are utilized. Through this analysis, important keywords and themes are extracted, clarifying the elements necessary for film set design.
[0501] The server generates a three-dimensional visual model based on the extracted information. A game engine (e.g., Unity or Unreal Engine) is used to render the design in the virtual space and generate VR content. At this stage, the server works in conjunction with virtual reality equipment to deliver the generated model to the user in real time via the device.
[0502] For example, if a user sets a scene for a "medieval village," the server extracts themes such as wooden architecture and cobblestone streets, and generates a realistic and creative village set based on these. The user then uses VR equipment to view the set and provides feedback through prompts such as, "Generate a set design for a fantastical movie scene set in a medieval European village."
[0503] This feedback is sent to the server and used as training data for the generating AI model. This improves the accuracy and effectiveness of subsequent design generation and optimizes the set design.
[0504] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0505] Step 1:
[0506] The user inputs the film's script and visual concepts into the terminal. This input includes film scene settings, characters, and key narrative themes. During this process, the input is saved as text data in a system-recognizable format. Specifically, the user can record information using a keyboard or voice input function.
[0507] Step 2:
[0508] The terminal receives input data from the user and formats it as digital data. In this step, a text analysis tool is used to segment the data and convert it to a standard format (e.g., JSON). Data processing involves breaking down the scenario into its constituent elements and shaping it into a form that is easy to analyze. The output is a parseable data structure that is ready to be sent to the server. Specific actions include launching text analysis software and formatting the data.
[0509] Step 3:
[0510] The server receives data sent from the terminal and performs analysis using advanced natural language processing capabilities. Here, a generative AI model and natural language processing libraries are used to extract important keywords and themes from the data. The input is formatted scenario data, and the output is the extracted keywords. Specific operations include model initialization and the application of the analysis algorithm.
[0511] Step 4:
[0512] The server generates a three-dimensional visual model using an AI agent based on extracted keywords. Here, a game engine (e.g., Unity or Unreal Engine) is used to render the design in a virtual space and generate VR content. As part of the data processing, keywords are converted into relevant visual elements and incorporated into the three-dimensional model. The output is the completed visual model. Specific operations include launching 3D modeling tools and rendering the model.
[0513] Step 5:
[0514] The terminal receives a visual model generated from the server and streams it to the user. This step involves real-time data delivery via an interface with the VR device. The input is the visual model sent from the server, and the output is the VR content displayed in real time. Specific operations include implementing streaming technology and synchronizing with the VR device.
[0515] Step 6:
[0516] Users view visual models via VR equipment and provide feedback. This feedback includes design evaluations, areas for improvement, and further requests. Input involves receiving feedback from users verbally or in writing. Output involves saving the feedback information to the device and preparing it for transmission to the server. Specific operations include using a feedback input interface.
[0517] Step 7:
[0518] The server receives user feedback and uses it as training data for the generated AI model. Using machine learning algorithms, it analyzes the feedback information and implements a process to improve the accuracy of the AI model. Input includes feedback data, and output includes an improved design generation algorithm. Specific actions include retraining the AI model and tuning its parameters.
[0519] (Application Example 1)
[0520] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0521] In modern commercial and retail design, real-time spatial adjustment and display optimization are required, but traditional methods are time-consuming, costly, and difficult to implement efficiently. In particular, intuitive design creation and evaluation in virtual environments, along with rapid feedback loops, are essential.
[0522] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0523] In this invention, the server includes means for receiving and analyzing spatial design information input by a user, means for generating a three-dimensional virtual environment design based on the analyzed information, means for visualizing the generated virtual environment design in a virtual domain and providing it to the user, means for receiving feedback from the user and using it to improve the accuracy of the generation means, and means for extracting important keywords and themes from the spatial design information using natural language processing technology. This enables the efficient design and evaluation of virtual spaces in real time.
[0524] "Spatial design information" refers to information that indicates the design concept, components, and themes of a virtual environment.
[0525] "Means of analysis" refers to the technologies and methods used to process and understand input information.
[0526] "Three-dimensional virtual environment design" refers to the representation of a virtual design in a three-dimensional space, generated on a computer.
[0527] "Means of visualization in a virtual domain" refers to methods for visually displaying generated designs and models using virtual reality technology.
[0528] "Feedback" refers to information collected from users, including opinions and evaluations, that is used to improve the system.
[0529] "Natural language processing technology" is a technology that enables computers to understand and process human language, and is used for various types of analysis and extraction.
[0530] "Key keywords and themes" refer to concepts and ideas that deserve particular attention in spatial design information.
[0531] The system for implementing this invention allows a user to input spatial design information and generate and visualize a virtual environment based on that information. The user inputs information about the spatial design using a smart device. The terminal receives this information, converts it into a standardized data format such as JSON, and sends it to the server.
[0532] The server uses this received data and employs natural language processing techniques to extract important keywords and themes from the information. This is expected to involve the use of natural language processing libraries such as spaCy and BERT. Subsequently, a generative AI model is used to automatically generate a three-dimensional virtual environment design. This generated design is then rendered into the virtual domain using engines such as Unity or Unreal Engine.
[0533] Users can visually confirm the generated virtual environment through a VR-compatible headset. User feedback is sent back to the server via the device and used to improve the accuracy of AI generation. This feedback loop ultimately leads to more efficient and creative virtual space design.
[0534] For example, when a retail store is testing a new store exterior design, they can use a prompt such as, "Generate a proposal for placing the new product display in a location with natural light. Also, create a set design that reflects a simple, Scandinavian-style store." This prompt will quickly generate a virtual environment that meets their requirements. Such a system makes it possible to efficiently consider and quickly implement effective designs.
[0535] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0536] Step 1:
[0537] Users input spatial design information via smart devices. The input information is converted into JSON or XML format and stored on the device as standardized data.
[0538] Step 2:
[0539] The terminal sends standardized data received from the user to the server. Network communication protocols are used in this process to ensure the data reaches the server securely.
[0540] Step 3:
[0541] The server analyzes the received data and uses natural language processing techniques to extract important keywords and themes from the information. In this step, tools such as spaCy and BERT are used for analysis, and the extracted keywords are prepared as input for the generative AI model.
[0542] Step 4:
[0543] The server automatically generates a three-dimensional virtual environment design using an AI model based on the extracted keywords. The generated design data is converted into a format that can be read as a scene file by Unity or Unreal Engine.
[0544] Step 5:
[0545] The virtual environment design generated by the server is sent to the terminal, and the user wears a VR-compatible headset to visualize and review the design within the virtual environment. This allows the user to evaluate the design in real time.
[0546] Step 6:
[0547] Users input feedback into their devices based on the design they have reviewed. This feedback may include specific suggestions, such as which parts of the design they would like to see improved.
[0548] Step 7:
[0549] The device sends user feedback to the server, which uses this feedback to update the AI model's learning and improve the accuracy of future design generation. This completes the feedback loop and drives improvements to the entire system.
[0550] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0551] This invention is a system aimed at providing efficient set design in film production and an interactive environment that takes user emotions into consideration. This system has a configuration in which the user, terminal, and server cooperate and are combined with an emotion engine.
[0552] First, the user inputs movie script information and visual concepts from their device. This includes story scenes and visual images. The input data is formatted by the device and sent to the server.
[0553] The server receives the input scenario information and performs analysis using natural language processing techniques. It extracts important keywords and themes to form an initial design for the movie set. The design generated at this stage is rendered in a virtual space and provided to the user.
[0554] Next, the emotion engine is activated, analyzing the user's facial expressions and voice data to read their emotions. This emotional information is collected in real time along with other data while the user is experiencing the VR content.
[0555] The server reflects the results from the emotion engine and dynamically adjusts the set design and visual elements of the virtual space. This step is performed in real time to reflect changes in the user's emotions. For example, if the user expresses surprise, the lighting in that scene and the placement of specific objects will change.
[0556] Through the terminal, users evaluate the visualized virtual space and provide feedback. This feedback is used to improve the set design and is supplied as training data to the server's AI agent. This feedback mechanism ensures higher accuracy in subsequent generation.
[0557] For example, if a user sets a scene from a horror movie, the server automatically generates dark lighting and an eerie environment. As the user experiences the scene, the emotion engine detects fear or surprise, and parts of the virtual space are modified, with sounds and movements added to increase the sense of unease. In this way, fine-tuning that takes the user's emotions into account makes the set more realistic and dynamic.
[0558] In this way, the present invention enables filmmakers to construct realistic set designs cost-effectively, while also allowing for creative expression that reflects the emotions of the user.
[0559] The following describes the processing flow.
[0560] Step 1:
[0561] Users use their devices to input movie scripts and visual concepts. This includes describing detailed storylines and the atmosphere of each scene.
[0562] Step 2:
[0563] The terminal formats the scenario information entered by the user and prepares the data for transmission to the server.
[0564] Step 3:
[0565] The server receives scenario information sent from the terminal and analyzes the information using natural language processing. This analysis extracts important keywords and themes for each scene.
[0566] Step 4:
[0567] The server uses the extracted data for an AI agent to automatically generate a 3D movie set design. The generated design is then converted into a virtual space.
[0568] Step 5:
[0569] The server renders the generated movie set design and sends it to the device as VR content.
[0570] Step 6:
[0571] The terminal provides the received VR content to the user, and the user experiences the set using VR equipment.
[0572] Step 7:
[0573] During the user experience, the emotion engine analyzes the user's facial expressions and voice, collecting emotion data in real time.
[0574] Step 8:
[0575] The server receives the results of the emotion engine's analysis and dynamically adjusts the set design and visual elements of the virtual space according to the user's emotions. For example, if the emotion of surprise is detected, the surrounding lighting is changed.
[0576] Step 9:
[0577] Users provide feedback through their devices based on their experience, sending any points they want corrected or elements they want to add to the server.
[0578] Step 10:
[0579] The server receives feedback and uses it as data for the AI agent to learn from. The feedback is then incorporated into the next design generation.
[0580] Step 11:
[0581] The server manages multiple scenes simultaneously as needed and executes optimization processes, making adjustments according to the user's project requirements while maintaining overall consistency.
[0582] (Example 2)
[0583] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0584] In film production, there is a need for real-time adjustments to set design based on user emotions, but this is difficult to achieve efficiently with current technology. Furthermore, there is a lack of processes to incorporate user feedback into subsequent designs. As a result, production times are prolonged and costs are increased.
[0585] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0586] In this invention, the server includes means for receiving and analyzing linguistic or visual information input by a user, means for generating an initial three-dimensional environment design based on the analyzed information, and means for detecting and analyzing the user's emotional state and dynamically adjusting the visual elements within the virtual domain. This makes it possible to automatically and in real time adjust the design of a movie set according to the user's emotions, thereby efficiently reducing costs.
[0587] A "user" is an entity that uses the system to design and experience movie sets.
[0588] "Inputted linguistic or visual information" refers to text data and visual concept-based data of scenarios provided by the user.
[0589] "Means of analysis" refer to the processes and techniques used to understand input data and extract important information.
[0590] "Three-dimensional environmental design" refers to the design of three-dimensional sets and structures created for films and visual content.
[0591] A "virtual domain" refers to a real-time, accessible digital space created within a computer system.
[0592] "Visualization" refers to the process of visually representing digital data so that users can actually see and understand its contents.
[0593] "Emotional state" refers to the results of an analysis of the user's psychological or emotional responses.
[0594] "Dynamic adjustment mechanisms" refer to features that change the environment or elements in real time according to the user's emotions or other circumstances.
[0595] "Rating" refers to the evaluation or opinion provided by a user about their experience using the system.
[0596] "Feedback" refers to the ratings provided by users, which are used to improve and adjust the system.
[0597] This invention provides a system that offers an interactive, emotionally responsive virtual space based on scenarios and designs, for both filmmakers and audiences. The system consists of three components: a user, a terminal, and a server, and each component works in cooperation with the others.
[0598] Users use their devices to input movie script information and visual concepts. This information includes the story flow and visual images. For example, users can use a keyboard to write in key scenes from the story or use a pen tablet to draw concept art.
[0599] The terminal receives information entered by the user and converts it into a standard data format (e.g., JSON or PNG). During this process, the terminal utilizes its capabilities as a modern interactive device to prepare the data for transmission to the server.
[0600] The server receives data formatted by the terminal and analyzes the scenario information using a generative AI model. Specifically, it uses natural language processing techniques to extract important concepts and themes and generates an initial three-dimensional environment design based on them. Information processing is performed using libraries such as Python's SpaCy. This generated design is visualized in a virtual domain in real time and provided to the user.
[0601] Furthermore, the server utilizes an emotion engine to analyze the user's emotional state. This engine processes data acquired using the device's camera and microphone, and uses facial recognition software and voice analysis technology to read the user's emotions. For example, OpenCV supports facial analysis and performs dynamic adjustments to the environment based on emotions.
[0602] For example, if a user sets a scene from a horror movie, the server automatically generates a dark and eerie environment. While the user is experiencing VR, if the emotion engine detects fear or surprise, the server reflects that emotional information and changes the visual elements in the virtual space in real time. For example, it might emphasize the sound of drafts or shadows to increase the sense of unease.
[0603] In this way, this system makes it possible to create more realistic and emotionally resonant set designs while keeping costs down during the film production process. An example of a prompt to be input into the generating AI model would be, "Please set up a forest scene for a horror movie. Create an environment that is dimly lit, foggy, and has the sound of a wolf howling in the distance in the background."
[0604] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0605] Step 1:
[0606] Users input movie script information and visual concepts into the device. Specifically, they can input text data using a keyboard or create visual data using a pen tablet. The input at this time is raw data related to the script and images.
[0607] Step 2:
[0608] The terminal converts the received input data into a standard data format. Text data is converted to JSON format, and visual data is converted to common image formats such as PNG or JPEG. This formatted data is then ready to be sent to the server. The output is formatted data ready to be sent to the server.
[0609] Step 3:
[0610] The server receives formatted data sent from the terminal. Using natural language processing techniques, it extracts important concepts and themes from the text. The data is then analyzed using Python's SpaCy or other natural language processing libraries. At this point, the output consists of the extracted important keywords and themes.
[0611] Step 4:
[0612] The server uses a generative AI model to generate an initial three-dimensional environment design based on extracted keywords and themes. This process provides the AI model with pre-configured prompts and places the initial design in a virtual space. This design is visualized to the user via a VR headset or display. The output is a visually displayable virtual environment design.
[0613] Step 5:
[0614] The device collects user emotion data in real time. It records the user's emotional state from their facial expressions and voice using a camera and microphone. The emotion engine analyzes this input data and formats the detected emotions into data for transmission to the server. The output is summarized emotion information.
[0615] Step 6:
[0616] The server receives emotional information sent from the emotion engine and dynamically adjusts elements within the virtual space. For example, if the user expresses surprise, it might change the lighting or add specific sound effects. These adjustments enable the virtual space to provide a more real-time and interactive experience. The output is the adjusted virtual environment.
[0617] Step 7:
[0618] After experiencing the virtual environment, users provide feedback through their terminal. This feedback is organized and formatted on the terminal and sent to the server. The server analyzes this feedback and saves it as training data to be used for future design generation. The output is the feedback data for analysis.
[0619] (Application Example 2)
[0620] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0621] Traditional virtual stores have struggled to dynamically adjust the environment based on user emotions, limiting their ability to enhance immersion and user experience. In particular, there is a growing need to provide an optimal visual and auditory environment that responds to user emotions.
[0622] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0623] In this invention, the server includes means for receiving and analyzing information input from the user, means for generating a three-dimensional virtual environment based on the analyzed information, means for analyzing the user's facial expressions and voice data to detect emotions, and means for dynamically adjusting the visual and auditory elements of the virtual environment based on the detected emotion information. This makes it possible to provide an optimal store environment that is in line with the user's emotions.
[0624] A "user" is an individual or group that uses this system to input information or experience the virtual environment.
[0625] "Receiving" refers to the process of acquiring information and data provided by the user.
[0626] "Analysis" refers to the process of processing received information to understand its meaning and theme.
[0627] A "three-dimensional virtual environment" refers to a three-dimensional digital space created using computer technology.
[0628] "Visualization" is the process of visually displaying digital data, enabling users to observe or experience it.
[0629] "Facial expression data" refers to information that captures the user's facial movements and reactions.
[0630] "Voice data" refers to data that captures the user's voice or sounds and uses them as information.
[0631] "Detecting emotions" involves analyzing facial expressions and voice data to identify the user's emotional state.
[0632] "Visual elements" refer to elements such as appearance, color, and layout within a virtual environment.
[0633] "Auditory elements" refer to elements such as sounds, music, and sound effects within a virtual environment.
[0634] "Dynamic adjustment" refers to the process of instantly changing elements of a virtual environment in response to real-time changing conditions.
[0635] To realize this invention, the server receives input information from the user and uses natural language processing technology to analyze it. Based on the analyzed information, the server generates a three-dimensional virtual environment, visualizes it, and provides it to the user. The user's terminal also collects facial expression data using facial recognition technology and audio data using voice analysis technology. This data is processed on a cloud service to detect the user's emotions. The server has the function to dynamically adjust the visual and auditory elements within the virtual environment based on the detected emotion information. This allows the user to enjoy an optimal environment tailored to their individual emotions through their experience in the virtual store. For example, if the user indicates a relaxed emotion, the server will warm the lighting in the virtual store and play soothing music. The software supporting this process includes Python, Unity, the Google Cloud Vision API, and an NLP model.
[0636] As a concrete example, consider the experience of a user visiting a virtual store using smart glasses. When the user focuses on a new product displayed on a table and shows excitement, it is possible to interact with the user by displaying detailed information about that product and related promotions in their field of view, and providing audio guidance. An example of a prompt would be, "How should information be presented in the virtual environment when the user is in an excited state?"
[0637] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0638] Step 1:
[0639] Users input movie script information and visual concepts through their terminals. This input data is formatted and sent to the server. The input consists of text-based script information and image data, which are formatted on the terminal. The output is the formatted data sent to the server.
[0640] Step 2:
[0641] The server performs natural language processing using the received scenario information to extract important keywords and themes. This process uses an NLP model to analyze the text data and generate a keyword list based on the analysis. The input is the scenario information, and the output is the extracted keywords and themes. The analyzed information forms the basis for the initial design of the three-dimensional virtual environment.
[0642] Step 3:
[0643] The server generates a three-dimensional virtual environment based on the extracted information and renders its design within the virtual space. Visual elements are constructed using Unity and presented to the user. The input consists of extracted keywords and themes, and the output is a three-dimensional model within the virtual space.
[0644] Step 4:
[0645] While a user experiences a virtual environment using a smart device, the device acquires the user's facial expression and voice data in real time via its camera and microphone. The input is the collected facial expression and voice data, and the output is sensor data for analysis.
[0646] Step 5:
[0647] The server analyzes the acquired facial and voice data to detect the user's emotional state. It uses the Google Cloud Vision API to analyze changes in facial expressions and voice tone using voice analysis software. The input is data from sensors, and the output is the result of the emotion analysis.
[0648] Step 6:
[0649] The server dynamically adjusts the visual and auditory elements of the virtual environment based on detected emotion information. Lighting, music, and sound effects are modified through Unity to create an environment that matches the user's emotions. The input is the emotion analysis result, and the output is the adjusted virtual space.
[0650] Step 7:
[0651] Users experience a tuned virtual environment and send their feedback to the server via their device. This feedback data is stored on the server and used as training data for the generated AI model. The input is user feedback, and the output is data for model improvement.
[0652] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0653] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0654] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0655] [Fourth Embodiment]
[0656] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0657] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0658] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0659] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0660] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0661] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0662] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0663] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0664] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0665] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0666] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0667] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0668] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0669] This invention is a system that supports the efficiency of set design in the film production process, and operates in cooperation with three parties: the user, the terminal, and the server. A specific embodiment of this system is shown below.
[0670] Users input information into their devices that will serve as the film's script and visual concept. This information describes the content, atmosphere, and physical characteristics of the film scenes, and represents the basic ideas and policies for the film's production.
[0671] The terminal processes the input scenario information as digital data and sends it to the server. This includes a function to standardize the scenario information and prepare it in an analyzable format.
[0672] The server uses advanced natural language processing techniques to analyze the scenario information received from the terminal. This analysis process extracts important keywords and themes from the scenario, clarifying the elements necessary for designing the film set.
[0673] The server then uses AI agents to automatically generate three-dimensional movie set designs based on the extracted keywords and themes. These designs are rendered in a virtual space and constructed using virtual reality (VR) technology.
[0674] The generated set is streamed to the user via a device, allowing the user to visually inspect the set in real time using VR equipment. The user can evaluate the set design and provide feedback through the device regarding desired changes and improvements.
[0675] This feedback is sent to the server, which uses it to train the AI. The AI agent incorporates user feedback to improve the accuracy and applicability of subsequent design generation. The server also manages multiple scenes simultaneously and optimizes the set design as needed.
[0676] For example, if a user sets a scene of a medieval village, the server automatically extracts relevant themes such as wooden buildings, cobblestone streets, and crops, and generates a realistic and creative village set based on these. The user can then review the scene and make adjustments, such as the placement of buildings or decorations, and this feedback contributes to the evolution of the AI. In film projects involving multiple scenes at once, users can properly manage these set designs and achieve overall time and cost reductions.
[0677] The following describes the processing flow.
[0678] Step 1:
[0679] Users use a terminal to input movie scripts and visual concepts. The input information includes details of the story, scene settings, and the overall visual atmosphere.
[0680] Step 2:
[0681] The terminal formats the scenario information from the user as digital data and prepares it for analysis. This data is then ready to be sent to the server.
[0682] Step 3:
[0683] The server receives scenario data from the terminals for analysis and uses natural language processing techniques to extract important keywords and themes from the information. This clarifies the data necessary for designing the film set.
[0684] Step 4:
[0685] The server uses extracted keywords and themes to automatically generate 3D movie set designs, utilizing an AI agent. The AI agent then uses existing design data and learning results to create creative set designs.
[0686] Step 5:
[0687] The generated 3D set design is rendered in a virtual space. The server prepares this design as VR content and gets ready to send it to the device.
[0688] Step 6:
[0689] The terminal provides VR content received from the server to the user. The user uses VR equipment to view and interact with a three-dimensional visual set in real time.
[0690] Step 7:
[0691] Users provide feedback based on the visualized set. For example, they can send fine-tuning suggestions, such as the location of buildings or the color scheme of decorations, to the server via their device.
[0692] Step 8:
[0693] The server receives feedback from users and provides that data to the AI agent to advance the learning process. This improves the accuracy of set design in subsequent generation.
[0694] Step 9:
[0695] The server manages multiple scenes simultaneously as needed, optimizing and coordinating sets to improve overall project consistency and efficiency.
[0696] (Example 1)
[0697] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0698] In film production, set design is a time-consuming and resource-intensive process. It's difficult to efficiently manage multiple scenes while simultaneously creating realistic and creative designs in a short timeframe. Furthermore, the process of effectively incorporating user feedback to improve the accuracy of future designs is complex. Therefore, there is a need to easily generate set designs from narrative information, visualize them in a virtual space, and provide users with an intuitive and interactive experience.
[0699] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0700] In this invention, the server includes means for receiving and analyzing narrative information input from a user, means for generating a three-dimensional visual model based on the analyzed information, and means for coordinating with one or more virtual reality devices to deliver the generated visual model in real time. This streamlines the creation and management of set designs and enables improved accuracy in design generation based on real-time visualization and feedback.
[0701] "Users" refers to individuals or groups who use the system to input narrative information, review the generated visual models, and provide feedback.
[0702] "Narrative information" refers to information used to describe the scenarios and concepts related to scenes in movies and visual content.
[0703] "Analysis" refers to the process of identifying important elements from the input narrative information and extracting specific design elements.
[0704] A "three-dimensional visual model" refers to a three-dimensional digital design generated based on narrative information, and a structure that can be displayed in a virtual space.
[0705] A "virtual space" refers to an artificial space created using digital technology that users can experience visually.
[0706] "Virtual reality devices" refer to devices (e.g., headsets and displays) that users use to visually and experientially perceive a virtual space.
[0707] "Real-time delivery" refers to providing the generated visual model to the user immediately and without delay.
[0708] "Feedback" refers to opinions and suggestions for improvement provided by users after experiencing a visual model, and includes information used to improve the generation process.
[0709] One embodiment of this invention is a system that supports the efficiency of set design in the film production process. In this system, the user, terminal, and server work in cooperation with each other.
[0710] Users input story information, such as scenarios and visual concepts, into their devices. This information concretely represents the structure and atmosphere of the film scenes and is a crucial element in realizing the basic ideas of the story.
[0711] The terminal processes the narrative information entered by the user as digital data. This process includes the function of converting the information into a parseable format. Specifically, the information is converted into a standard format such as JSON and sent to the server.
[0712] The server receives data sent from the terminal and analyzes it using generative AI models and natural language processing software. Specifically, NLP libraries and machine learning frameworks are utilized. Through this analysis, important keywords and themes are extracted, clarifying the elements necessary for film set design.
[0713] The server generates a three-dimensional visual model based on the extracted information. A game engine (e.g., Unity or Unreal Engine) is used to render the design in the virtual space and generate VR content. At this stage, the server works in conjunction with virtual reality equipment to deliver the generated model to the user in real time via the device.
[0714] For example, if a user sets a scene for a "medieval village," the server extracts themes such as wooden architecture and cobblestone streets, and generates a realistic and creative village set based on these. The user then uses VR equipment to view the set and provides feedback through prompts such as, "Generate a set design for a fantastical movie scene set in a medieval European village."
[0715] This feedback is sent to the server and used as training data for the generating AI model. This improves the accuracy and effectiveness of subsequent design generation and optimizes the set design.
[0716] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0717] Step 1:
[0718] The user inputs the film's script and visual concepts into the terminal. This input includes film scene settings, characters, and key narrative themes. During this process, the input is saved as text data in a system-recognizable format. Specifically, the user can record information using a keyboard or voice input function.
[0719] Step 2:
[0720] The terminal receives input data from the user and formats it as digital data. In this step, a text analysis tool is used to segment the data and convert it to a standard format (e.g., JSON). Data processing involves breaking down the scenario into its constituent elements and shaping it into a form that is easy to analyze. The output is a parseable data structure that is ready to be sent to the server. Specific actions include launching text analysis software and formatting the data.
[0721] Step 3:
[0722] The server receives data sent from the terminal and performs analysis using advanced natural language processing capabilities. Here, a generative AI model and natural language processing libraries are used to extract important keywords and themes from the data. The input is formatted scenario data, and the output is the extracted keywords. Specific operations include model initialization and the application of the analysis algorithm.
[0723] Step 4:
[0724] The server generates a three-dimensional visual model using an AI agent based on extracted keywords. Here, a game engine (e.g., Unity or Unreal Engine) is used to render the design in a virtual space and generate VR content. As part of the data processing, keywords are converted into relevant visual elements and incorporated into the three-dimensional model. The output is the completed visual model. Specific operations include launching 3D modeling tools and rendering the model.
[0725] Step 5:
[0726] The terminal receives a visual model generated from the server and streams it to the user. This step involves real-time data delivery via an interface with the VR device. The input is the visual model sent from the server, and the output is the VR content displayed in real time. Specific operations include implementing streaming technology and synchronizing with the VR device.
[0727] Step 6:
[0728] Users view visual models via VR equipment and provide feedback. This feedback includes design evaluations, areas for improvement, and further requests. Input involves receiving feedback from users verbally or in writing. Output involves saving the feedback information to the device and preparing it for transmission to the server. Specific operations include using a feedback input interface.
[0729] Step 7:
[0730] The server receives user feedback and uses it as training data for the generated AI model. Using machine learning algorithms, it analyzes the feedback information and implements a process to improve the accuracy of the AI model. Input includes feedback data, and output includes an improved design generation algorithm. Specific actions include retraining the AI model and tuning its parameters.
[0731] (Application Example 1)
[0732] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0733] In modern commercial and retail design, real-time spatial adjustment and display optimization are required, but traditional methods are time-consuming, costly, and difficult to implement efficiently. In particular, intuitive design creation and evaluation in virtual environments, along with rapid feedback loops, are essential.
[0734] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0735] In this invention, the server includes means for receiving and analyzing spatial design information input by a user, means for generating a three-dimensional virtual environment design based on the analyzed information, means for visualizing the generated virtual environment design in a virtual domain and providing it to the user, means for receiving feedback from the user and using it to improve the accuracy of the generation means, and means for extracting important keywords and themes from the spatial design information using natural language processing technology. This enables the efficient design and evaluation of virtual spaces in real time.
[0736] "Spatial design information" refers to information that indicates the design concept, components, and themes of a virtual environment.
[0737] "Means of analysis" refers to the technologies and methods used to process and understand input information.
[0738] "Three-dimensional virtual environment design" refers to the representation of a virtual design in a three-dimensional space, generated on a computer.
[0739] "Means of visualization in a virtual domain" refers to methods for visually displaying generated designs and models using virtual reality technology.
[0740] "Feedback" refers to information collected from users, including opinions and evaluations, that is used to improve the system.
[0741] "Natural language processing technology" is a technology that enables computers to understand and process human language, and is used for various types of analysis and extraction.
[0742] "Key keywords and themes" refer to concepts and ideas that deserve particular attention in spatial design information.
[0743] The system for implementing this invention allows a user to input spatial design information and generate and visualize a virtual environment based on that information. The user inputs information about the spatial design using a smart device. The terminal receives this information, converts it into a standardized data format such as JSON, and sends it to the server.
[0744] The server uses this received data and employs natural language processing techniques to extract important keywords and themes from the information. This is expected to involve the use of natural language processing libraries such as spaCy and BERT. Subsequently, a generative AI model is used to automatically generate a three-dimensional virtual environment design. This generated design is then rendered into the virtual domain using engines such as Unity or Unreal Engine.
[0745] Users can visually confirm the generated virtual environment through a VR-compatible headset. User feedback is sent back to the server via the device and used to improve the accuracy of AI generation. This feedback loop ultimately leads to more efficient and creative virtual space design.
[0746] For example, when a retail store is testing a new store exterior design, they can use a prompt such as, "Generate a proposal for placing the new product display in a location with natural light. Also, create a set design that reflects a simple, Scandinavian-style store." This prompt will quickly generate a virtual environment that meets their requirements. Such a system makes it possible to efficiently consider and quickly implement effective designs.
[0747] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0748] Step 1:
[0749] Users input spatial design information via smart devices. The input information is converted into JSON or XML format and stored on the device as standardized data.
[0750] Step 2:
[0751] The terminal sends standardized data received from the user to the server. Network communication protocols are used in this process to ensure the data reaches the server securely.
[0752] Step 3:
[0753] The server analyzes the received data and uses natural language processing techniques to extract important keywords and themes from the information. In this step, tools such as spaCy and BERT are used for analysis, and the extracted keywords are prepared as input for the generative AI model.
[0754] Step 4:
[0755] The server automatically generates a three-dimensional virtual environment design using an AI model based on the extracted keywords. The generated design data is converted into a format that can be read as a scene file by Unity or Unreal Engine.
[0756] Step 5:
[0757] The virtual environment design generated by the server is sent to the terminal, and the user wears a VR-compatible headset to visualize and review the design within the virtual environment. This allows the user to evaluate the design in real time.
[0758] Step 6:
[0759] Users input feedback into their devices based on the design they have reviewed. This feedback may include specific suggestions, such as which parts of the design they would like to see improved.
[0760] Step 7:
[0761] The device sends user feedback to the server, which uses this feedback to update the AI model's learning and improve the accuracy of future design generation. This completes the feedback loop and drives improvements to the entire system.
[0762] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0763] This invention is a system aimed at providing efficient set design in film production and an interactive environment that takes user emotions into consideration. This system has a configuration in which the user, terminal, and server cooperate and are combined with an emotion engine.
[0764] First, the user inputs movie script information and visual concepts from their device. This includes story scenes and visual images. The input data is formatted by the device and sent to the server.
[0765] The server receives the input scenario information and performs analysis using natural language processing techniques. It extracts important keywords and themes to form an initial design for the movie set. The design generated at this stage is rendered in a virtual space and provided to the user.
[0766] Next, the emotion engine is activated, analyzing the user's facial expressions and voice data to read their emotions. This emotional information is collected in real time along with other data while the user is experiencing the VR content.
[0767] The server reflects the results from the emotion engine and dynamically adjusts the set design and visual elements of the virtual space. This step is performed in real time to reflect changes in the user's emotions. For example, if the user expresses surprise, the lighting in that scene and the placement of specific objects will change.
[0768] Through the terminal, users evaluate the visualized virtual space and provide feedback. This feedback is used to improve the set design and is supplied as training data to the server's AI agent. This feedback mechanism ensures higher accuracy in subsequent generation.
[0769] For example, if a user sets a scene from a horror movie, the server automatically generates dark lighting and an eerie environment. As the user experiences the scene, the emotion engine detects fear or surprise, and parts of the virtual space are modified, with sounds and movements added to increase the sense of unease. In this way, fine-tuning that takes the user's emotions into account makes the set more realistic and dynamic.
[0770] In this way, the present invention enables filmmakers to construct realistic set designs cost-effectively, while also allowing for creative expression that reflects the emotions of the user.
[0771] The following describes the processing flow.
[0772] Step 1:
[0773] Users use their devices to input movie scripts and visual concepts. This includes describing detailed storylines and the atmosphere of each scene.
[0774] Step 2:
[0775] The terminal formats the scenario information entered by the user and prepares the data for transmission to the server.
[0776] Step 3:
[0777] The server receives scenario information sent from the terminal and analyzes the information using natural language processing. This analysis extracts important keywords and themes for each scene.
[0778] Step 4:
[0779] The server uses the extracted data for an AI agent to automatically generate a 3D movie set design. The generated design is then converted into a virtual space.
[0780] Step 5:
[0781] The server renders the generated movie set design and sends it to the device as VR content.
[0782] Step 6:
[0783] The terminal provides the received VR content to the user, and the user experiences the set using VR equipment.
[0784] Step 7:
[0785] During the user experience, the emotion engine analyzes the user's facial expressions and voice, collecting emotion data in real time.
[0786] Step 8:
[0787] The server receives the results of the emotion engine's analysis and dynamically adjusts the set design and visual elements of the virtual space according to the user's emotions. For example, if the emotion of surprise is detected, the surrounding lighting is changed.
[0788] Step 9:
[0789] Users provide feedback through their devices based on their experience, sending any points they want corrected or elements they want to add to the server.
[0790] Step 10:
[0791] The server receives feedback and uses it as data for the AI agent to learn from. The feedback is then incorporated into the next design generation.
[0792] Step 11:
[0793] The server manages multiple scenes simultaneously as needed and executes optimization processes, making adjustments according to the user's project requirements while maintaining overall consistency.
[0794] (Example 2)
[0795] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0796] In film production, there is a need for real-time adjustments to set design based on user emotions, but this is difficult to achieve efficiently with current technology. Furthermore, there is a lack of processes to incorporate user feedback into subsequent designs. As a result, production times are prolonged and costs are increased.
[0797] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0798] In this invention, the server includes means for receiving and analyzing linguistic or visual information input by a user, means for generating an initial three-dimensional environment design based on the analyzed information, and means for detecting and analyzing the user's emotional state and dynamically adjusting the visual elements within the virtual domain. This makes it possible to automatically and in real time adjust the design of a movie set according to the user's emotions, thereby efficiently reducing costs.
[0799] A "user" is an entity that uses the system to design and experience movie sets.
[0800] "Inputted linguistic or visual information" refers to text data and visual concept-based data of scenarios provided by the user.
[0801] "Means of analysis" refer to the processes and techniques used to understand input data and extract important information.
[0802] "Three-dimensional environmental design" refers to the design of three-dimensional sets and structures created for films and visual content.
[0803] A "virtual domain" refers to a real-time, accessible digital space created within a computer system.
[0804] "Visualization" refers to the process of visually representing digital data so that users can actually see and understand its contents.
[0805] "Emotional state" refers to the results of an analysis of the user's psychological or emotional responses.
[0806] "Dynamic adjustment mechanisms" refer to features that change the environment or elements in real time according to the user's emotions or other circumstances.
[0807] "Rating" refers to the evaluation or opinion provided by a user about their experience using the system.
[0808] "Feedback" refers to the ratings provided by users, which are used to improve and adjust the system.
[0809] This invention provides a system that offers an interactive, emotionally responsive virtual space based on scenarios and designs, for both filmmakers and audiences. The system consists of three components: a user, a terminal, and a server, and each component works in cooperation with the others.
[0810] Users use their devices to input movie script information and visual concepts. This information includes the story flow and visual images. For example, users can use a keyboard to write in key scenes from the story or use a pen tablet to draw concept art.
[0811] The terminal receives information entered by the user and converts it into a standard data format (e.g., JSON or PNG). During this process, the terminal utilizes its capabilities as a modern interactive device to prepare the data for transmission to the server.
[0812] The server receives data formatted by the terminal and analyzes the scenario information using a generative AI model. Specifically, it uses natural language processing techniques to extract important concepts and themes and generates an initial three-dimensional environment design based on them. Information processing is performed using libraries such as Python's SpaCy. This generated design is visualized in a virtual domain in real time and provided to the user.
[0813] Furthermore, the server utilizes an emotion engine to analyze the user's emotional state. This engine processes data acquired using the device's camera and microphone, and uses facial recognition software and voice analysis technology to read the user's emotions. For example, OpenCV supports facial analysis and performs dynamic adjustments to the environment based on emotions.
[0814] For example, if a user sets a scene from a horror movie, the server automatically generates a dark and eerie environment. While the user is experiencing VR, if the emotion engine detects fear or surprise, the server reflects that emotional information and changes the visual elements in the virtual space in real time. For example, it might emphasize the sound of drafts or shadows to increase the sense of unease.
[0815] In this way, this system makes it possible to create more realistic and emotionally resonant set designs while keeping costs down during the film production process. An example of a prompt to be input into the generating AI model would be, "Please set up a forest scene for a horror movie. Create an environment that is dimly lit, foggy, and has the sound of a wolf howling in the distance in the background."
[0816] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0817] Step 1:
[0818] Users input movie script information and visual concepts into the device. Specifically, they can input text data using a keyboard or create visual data using a pen tablet. The input at this time is raw data related to the script and images.
[0819] Step 2:
[0820] The terminal converts the received input data into a standard data format. Text data is converted to JSON format, and visual data is converted to common image formats such as PNG or JPEG. This formatted data is then ready to be sent to the server. The output is formatted data ready to be sent to the server.
[0821] Step 3:
[0822] The server receives formatted data sent from the terminal. Using natural language processing techniques, it extracts important concepts and themes from the text. The data is then analyzed using Python's SpaCy or other natural language processing libraries. At this point, the output consists of the extracted important keywords and themes.
[0823] Step 4:
[0824] The server uses a generative AI model to generate an initial three-dimensional environment design based on extracted keywords and themes. This process provides the AI model with pre-configured prompts and places the initial design in a virtual space. This design is visualized to the user via a VR headset or display. The output is a visually displayable virtual environment design.
[0825] Step 5:
[0826] The device collects user emotion data in real time. It records the user's emotional state from their facial expressions and voice using a camera and microphone. The emotion engine analyzes this input data and formats the detected emotions into data for transmission to the server. The output is summarized emotion information.
[0827] Step 6:
[0828] The server receives emotional information sent from the emotion engine and dynamically adjusts elements within the virtual space. For example, if the user expresses surprise, it might change the lighting or add specific sound effects. These adjustments enable the virtual space to provide a more real-time and interactive experience. The output is the adjusted virtual environment.
[0829] Step 7:
[0830] After experiencing the virtual environment, users provide feedback through their terminal. This feedback is organized and formatted on the terminal and sent to the server. The server analyzes this feedback and saves it as training data to be used for future design generation. The output is the feedback data for analysis.
[0831] (Application Example 2)
[0832] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0833] Traditional virtual stores have struggled to dynamically adjust the environment based on user emotions, limiting their ability to enhance immersion and user experience. In particular, there is a growing need to provide an optimal visual and auditory environment that responds to user emotions.
[0834] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0835] In this invention, the server includes means for receiving and analyzing information input from the user, means for generating a three-dimensional virtual environment based on the analyzed information, means for analyzing the user's facial expressions and voice data to detect emotions, and means for dynamically adjusting the visual and auditory elements of the virtual environment based on the detected emotion information. This makes it possible to provide an optimal store environment that is in line with the user's emotions.
[0836] A "user" is an individual or group that uses this system to input information or experience the virtual environment.
[0837] "Receiving" refers to the process of acquiring information and data provided by the user.
[0838] "Analysis" refers to the process of processing received information to understand its meaning and theme.
[0839] A "three-dimensional virtual environment" refers to a three-dimensional digital space created using computer technology.
[0840] "Visualization" is the process of visually displaying digital data, enabling users to observe or experience it.
[0841] "Facial expression data" refers to information that captures the user's facial movements and reactions.
[0842] "Voice data" refers to data that captures the user's voice or sounds and uses them as information.
[0843] "Detecting emotions" involves analyzing facial expressions and voice data to identify the user's emotional state.
[0844] "Visual elements" refer to elements such as appearance, color, and layout within a virtual environment.
[0845] "Auditory elements" refer to elements such as sounds, music, and sound effects within a virtual environment.
[0846] "Dynamic adjustment" refers to the process of instantly changing elements of a virtual environment in response to real-time changing conditions.
[0847] To realize this invention, the server receives input information from the user and uses natural language processing technology to analyze it. Based on the analyzed information, the server generates a three-dimensional virtual environment, visualizes it, and provides it to the user. The user's terminal also collects facial expression data using facial recognition technology and audio data using voice analysis technology. This data is processed on a cloud service to detect the user's emotions. The server has the function to dynamically adjust the visual and auditory elements within the virtual environment based on the detected emotion information. This allows the user to enjoy an optimal environment tailored to their individual emotions through their experience in the virtual store. For example, if the user indicates a relaxed emotion, the server will warm the lighting in the virtual store and play soothing music. The software supporting this process includes Python, Unity, the Google Cloud Vision API, and an NLP model.
[0848] As a concrete example, consider the experience of a user visiting a virtual store using smart glasses. When the user focuses on a new product displayed on a table and shows excitement, it is possible to interact with the user by displaying detailed information about that product and related promotions in their field of view, and providing audio guidance. An example of a prompt would be, "How should information be presented in the virtual environment when the user is in an excited state?"
[0849] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0850] Step 1:
[0851] Users input movie script information and visual concepts through their terminals. This input data is formatted and sent to the server. The input consists of text-based script information and image data, which are formatted on the terminal. The output is the formatted data sent to the server.
[0852] Step 2:
[0853] The server performs natural language processing using the received scenario information to extract important keywords and themes. This process uses an NLP model to analyze the text data and generate a keyword list based on the analysis. The input is the scenario information, and the output is the extracted keywords and themes. The analyzed information forms the basis for the initial design of the three-dimensional virtual environment.
[0854] Step 3:
[0855] The server generates a three-dimensional virtual environment based on the extracted information and renders its design within the virtual space. Visual elements are constructed using Unity and presented to the user. The input consists of extracted keywords and themes, and the output is a three-dimensional model within the virtual space.
[0856] Step 4:
[0857] While a user experiences a virtual environment using a smart device, the device acquires the user's facial expression and voice data in real time via its camera and microphone. The input is the collected facial expression and voice data, and the output is sensor data for analysis.
[0858] Step 5:
[0859] The server analyzes the acquired facial and voice data to detect the user's emotional state. It uses the Google Cloud Vision API to analyze changes in facial expressions and voice tone using voice analysis software. The input is data from sensors, and the output is the result of the emotion analysis.
[0860] Step 6:
[0861] The server dynamically adjusts the visual and auditory elements of the virtual environment based on detected emotion information. Lighting, music, and sound effects are modified through Unity to create an environment that matches the user's emotions. The input is the emotion analysis result, and the output is the adjusted virtual space.
[0862] Step 7:
[0863] Users experience a tuned virtual environment and send their feedback to the server via their device. This feedback data is stored on the server and used as training data for the generated AI model. The input is user feedback, and the output is data for model improvement.
[0864] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0865] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0866] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0867] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0868] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0869] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0870] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0871] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0872] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0873] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0874] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0875] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0876] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0877] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0878] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0879] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0880] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0881] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0882] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0883] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0884] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0885] The following is further disclosed regarding the embodiments described above.
[0886] (Claim 1)
[0887] A means for receiving and analyzing scenario information entered by the user,
[0888] A means for generating a three-dimensional film set design based on analyzed information,
[0889] A means of visualizing the generated movie set design in a virtual space and providing it to the user,
[0890] A means of receiving user feedback and using it to improve the accuracy of the generation method,
[0891] A system that includes this.
[0892] (Claim 2)
[0893] The system according to claim 1, further comprising means for simultaneously managing and optimizing a film set design across multiple scenes.
[0894] (Claim 3)
[0895] A system according to claim 1, comprising means for extracting important keywords and themes from scenario information using natural language processing technology.
[0896] "Example 1"
[0897] (Claim 1)
[0898] A means of receiving and analyzing story information entered by users,
[0899] A means for generating a three-dimensional visual model based on the analyzed information,
[0900] A means of displaying the generated visual model in a virtual space and presenting it to the user,
[0901] A means of receiving user feedback and using it to improve the accuracy of the generation method,
[0902] A means of collaborating with one or more virtual reality devices to deliver the generated visual model in real time,
[0903] A system that includes this.
[0904] (Claim 2)
[0905] The system according to claim 1, which extracts important words and themes based on narrative information using natural information processing technology.
[0906] (Claim 3)
[0907] The system according to claim 1, further comprising means for simultaneously managing and adjusting a generated visual model across multiple scenes.
[0908] "Application Example 1"
[0909] (Claim 1)
[0910] A means for receiving and analyzing spatial design information input by the user,
[0911] A means for generating a three-dimensional virtual environment design based on analyzed information,
[0912] A means of visualizing the generated virtual environment design in a virtual domain and providing it to the user,
[0913] A means of receiving feedback from users and using it to improve the accuracy of the generation method,
[0914] A method for extracting important keywords and themes from spatial design information using natural language processing technology,
[0915] A system that includes this.
[0916] (Claim 2)
[0917] The system according to claim 1, further comprising means for simultaneously managing and optimizing virtual environment designs across multiple scenes.
[0918] (Claim 3)
[0919] The system according to claim 1, incorporating virtual reality technology for optimizing the arrangement of displays and components and lighting effects.
[0920] "Example 2 of combining an emotion engine"
[0921] (Claim 1)
[0922] A means for receiving and analyzing linguistic or visual information input by a user,
[0923] A means for generating an initial three-dimensional environment design based on the analyzed information,
[0924] A means of visualizing the generated environment design in a virtual domain and providing it to the user,
[0925] A means for detecting and analyzing the user's emotional state and dynamically adjusting visual elements within a virtual domain,
[0926] A means for receiving ratings from users and using them to improve the accuracy of the generation means,
[0927] A system that includes this.
[0928] (Claim 2)
[0929] The system according to claim 1, which adaptively changes information within a virtual domain in real time in response to changes in emotional state.
[0930] (Claim 3)
[0931] The system according to claim 1, which extracts important concepts and themes from scenario information using language data processing technology.
[0932] "Application example 2 when combining with an emotional engine"
[0933] (Claim 1)
[0934] A means of receiving and analyzing information entered by the user,
[0935] A means for generating a three-dimensional virtual environment based on the analyzed information,
[0936] A means of visualizing the generated virtual environment and providing it to the user,
[0937] A means for analyzing the user's facial expressions and voice data to detect emotions,
[0938] A means for dynamically adjusting the visual and auditory elements of a virtual environment based on detected emotional information,
[0939] A means of receiving user feedback and using it to improve the accuracy of the generation method,
[0940] A system that includes this.
[0941] (Claim 2)
[0942] The system according to claim 1, further comprising means for dynamically adjusting the environment within a virtual store in accordance with the user's emotions.
[0943] (Claim 3)
[0944] The system according to claim 1, comprising means for extracting important keywords or themes from user input information using natural language processing technology. [Explanation of Symbols]
[0945] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means for receiving and analyzing spatial design information input by the user, A means for generating a three-dimensional virtual environment design based on analyzed information, A means of visualizing the generated virtual environment design in a virtual domain and providing it to the user, A means of receiving feedback from users and using it to improve the accuracy of the generation method, A method for extracting important keywords and themes from spatial design information using natural language processing technology, A system that includes this.
2. The system according to claim 1, further comprising means for simultaneously managing and optimizing virtual environment designs across multiple scenes.
3. The system according to claim 1, incorporating virtual reality technology for optimizing the arrangement of displays and components and the effects of light.