system

A system efficiently generates and personalizes images by analyzing user input and adjusting AI-generated content to meet user intentions, addressing the challenge of obtaining copyright-free images that match user needs.

JP2026100699APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The challenge of efficiently obtaining copyright-free images that quickly match user intentions for use in materials is unresolved, particularly in modern information societies where traditional methods are time-consuming and costly.

Method used

A system that receives user input, analyzes it using natural language processing and computer vision, generates images with AI models like GANs, and adjusts them to meet user requirements, providing efficient and quick access to original images.

Benefits of technology

Enables users to obtain original, copyright-free images quickly and cost-effectively for various applications, enhancing the efficiency and personalization of image generation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100699000001_ABST
    Figure 2026100699000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means of receiving input from the user and understanding the conditions and image, A means of sending the received input data to the server, A means of analyzing input data on the server and identifying the user's intent, A means for making an image generation request based on the analysis results, Means for reviewing and adjusting the generated image as needed, Means of providing the final image to the user, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] There is a problem that it is difficult to obtain copyright-free images efficiently and effectively. In particular, quickly creating an original image that matches the user's intention and providing it in a form that can be used in materials is an important issue in the modern information society.

Means for Solving the Problems

[0005] This invention provides a means for receiving user input and understanding conditions and images based on that input. Next, the received input data is sent to a server, where the server analyzes the input data and identifies the user's intent. Based on the analysis results, an image generation request is made, and after confirming the generated image and making any necessary adjustments, the final image is provided to the user. In this way, the invention provides a system that allows the user to efficiently and quickly obtain an ideal image that can be used for creating materials.

[0006] A "user" is the entity that uses the system to generate images.

[0007] "Input" refers to the data, such as conditions or images, that the user provides to the system.

[0008] A "server" is a computer system that analyzes input data received from users and manages image generation.

[0009] An "AI model" is an artificial intelligence algorithm used to generate images based on specified specifications.

[0010] An "image generation request" is the process by which a server instructs an AI model to create an image based on predetermined conditions.

[0011] "Images" refer to visual data generated by AI models, intended for use by users.

[0012] "Adjustment" refers to the process of modifying the generated image as needed to ensure it meets the user's requirements.

[0013] "Provision" refers to the process of sending the final image from the server to the user and making it available for use. [Brief explanation of the drawing]

[0014] [Figure 1]It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

MODE FOR CARRYING OUT THE INVENTION

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0020] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention begins with a user inputting specific conditions or images from a terminal. The user inputs desired keywords or image data via a keyboard or touchscreen. The terminal has the function of sending this input data to a server. The server analyzes the received data, analyzes the input keywords using natural language processing techniques, and identifies the user's specific requests. If image data is included, it analyzes the image using computer vision techniques and extracts relevant visual elements.

[0036] The server requests image generation from the AI ​​model based on the analysis results. While various image generation technologies can be used as the AI ​​model in this invention, GANs (Generative Adversarial Networks) and deep learning technologies are generally utilized. The AI ​​model generates an original image that reflects the user's conditions based on the request. This generated image is temporarily stored on the server and reviewed for quality and suitability to the request. The image is fine-tuned as needed.

[0037] Finally, the server sends the generated image back to the terminal. The terminal displays the received image to the user, allowing the user to review it. For example, if a user requests an image of a futuristic city and inputs related abstract elements, the AI ​​model will generate an imaginary city image using multiple architectural styles and color schemes, and provide it to the user. The user can then download this image and use it for presentations or document creation.

[0038] This invention offers the advantage of allowing users to quickly obtain original, copyright-free images and use them free of charge for creating materials.

[0039] The following describes the processing flow.

[0040] Step 1:

[0041] Users use their devices to input criteria and images related to their desired image. They can also enter keywords in text or upload reference images.

[0042] Step 2:

[0043] The terminal sends the data entered by the user to the server. The data is packaged in common data formats such as JSON or XML and sent to the server over the network.

[0044] Step 3:

[0045] The server analyzes the received data. This analysis includes natural language processing (NLP) of the text, extracting keywords and phrases. If images are uploaded, computer vision technology is used to extract image features.

[0046] Step 4:

[0047] The server requests image generation from the AI ​​model based on the analysis results. The server inputs the analysis results into the AI ​​model and instructs it to generate an image that specifically reflects the user's conditions and requests.

[0048] Step 5:

[0049] AI models create images based on generation requests. The model generates images that conform to a specified style and concept, and returns the results to the server. AI models typically utilize architectures such as GANs (Generative Adversarial Networks) or deep learning.

[0050] Step 6:

[0051] The server reviews the generated image and evaluates whether it meets the specified requirements. If necessary, it performs processing to adjust the image's color tone and details.

[0052] Step 7:

[0053] The server sends the finalized image data to the terminal. The sent image is converted to a user-friendly format (e.g., JPEG, PNG).

[0054] Step 8:

[0055] The device displays images to the user. The user reviews the generated images, downloads them as needed, and uses them for document creation and presentations.

[0056] (Example 1)

[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0058] In today's information environment, the generation of creative visual data is becoming increasingly important, but users face the challenge of not being able to easily generate high-quality and original visual data. Furthermore, there is a need for a system that can accurately understand and quickly reflect the user's conditions and intentions regarding visual data generation.

[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0060] In this invention, the server includes means for analyzing input data and identifying the user's intent, means for inputting an image generation request as a prompt message to a generation AI model based on the analysis results, and means for checking the generated visual data and adjusting it as necessary. This makes it possible to quickly generate and provide original visual data that matches the user's intent.

[0061] An "information processing device" is a device that receives user input, processes it, and transmits it to another device or system.

[0062] A "generative AI model" is a mathematical model that uses artificial intelligence technology to generate visual data based on specified conditions.

[0063] A "prompt message" is text data that indicates instructions or conditions to be input into a generative AI model.

[0064] "Visual data" refers to digital data that includes images and graphical information, and is generated based on specific conditions or requirements.

[0065] This invention begins with a user inputting specific conditions or images using an information processing device. The user inputs desired keywords or image data into the information processing device via a keyboard or touchscreen. The information processing device acquires this input data and transmits it to a server via a network.

[0066] The server analyzes the received data, uses natural language processing libraries (e.g., NLTK and SpaCy) to analyze keywords and identify the user's intent. If image data is included, computer vision technologies (e.g., OpenCV and TENSORFLOW®) are used to analyze the image data and extract relevant visual elements.

[0067] Based on the analyzed information, the server requests image generation from the generative AI model. This request is provided to the generative AI model as a prompt based on the analysis results. The generative AI model is typically built using GAN (Generative Adversarial Network) technology running on PyTorch or TensorFlow. For example, prompts such as "futuristic city," "modern," and "technology" can be used.

[0068] The generated visual data is temporarily stored on a server and its quality and compliance with requirements are verified by a dedicated review algorithm. The generated visual data may be readjusted as needed.

[0069] Finally, the server sends the verified visual data to the information processing device. Users can then view the generated results on the information processing device's screen and download or use them for other purposes. This system offers users the advantage of quickly and efficiently obtaining original visual data and utilizing it for various purposes.

[0070] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0071] Step 1:

[0072] The user inputs desired keywords or images using the keyboard or touchscreen on the device. The input data is collected by the device and includes information contained in the prompt sentences requested by the generating AI model. The device then processes this data to send it to the server.

[0073] Step 2:

[0074] The device sends input data collected from the user to the server. The data sent includes keywords in text format and, if necessary, image data. The device generates an HTTP request and sends the data to the server API via this request.

[0075] Step 3:

[0076] The server receives data sent from the terminal. The server uses a natural language processing library (e.g., NLTK or SpaCy) to analyze the input keywords and identify the user's intent. Specifically, it tokenizes the keywords and extracts key concepts. The server's output is the analyzed text data.

[0077] Step 4:

[0078] When the server receives image data, it analyzes it using computer vision technologies (e.g., OpenCV or TensorFlow). The server extracts image features and identifies relevant visual elements. The output is data related to the extracted visual elements.

[0079] Step 5:

[0080] The server generates and inputs prompt sentences to the generation AI model based on the analysis results. The server constructs prompt sentences that are appropriate to the user's intent and passes them to the generation AI model. The generation AI model generates images based on this input.

[0081] Step 6:

[0082] The generative AI model generates visual data based on the prompt text. The technology typically used is a GAN (Generative Adversarial Network), and the generated images are original. The output at this stage is a file containing the generated visual data.

[0083] Step 7:

[0084] The server reviews the generated visual data to ensure it meets the criteria. Using a quality check algorithm, it evaluates whether the generated images meet the requirements and makes adjustments if necessary. The output is the adjusted visual data.

[0085] Step 8:

[0086] The server sends the final visual data back to the terminal. The server sends the visual data to the terminal, which displays it in a format that the user can view. The user can then review the final generated visual data and download or use it.

[0087] (Application Example 1)

[0088] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0089] In advertising, there is a need to quickly create original and compelling visual content tailored to the target market, but traditional methods are time-consuming and costly. Furthermore, it is difficult to concretize diverse images and concepts in a short time, necessitating solutions to more efficiently enhance advertising effectiveness.

[0090] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0091] In this invention, the server includes a device that receives instructions from the user and grasps the conditions and concepts, a device that transmits the received instruction data to an information processing device, and a device that analyzes the instruction data in the information processing device and identifies the user's purpose. This makes it possible to efficiently generate visuals that can be easily used for advertising campaigns and dissemination activities.

[0092] A "device that receives instructions from a user" is a device that receives conditions and conceptual information input by a user and provides an interface for initiating processing.

[0093] An "information processing device" is a device that analyzes data and performs calculations to identify the user's purpose and intentions.

[0094] A "visual generation request device" is a device that, based on analyzed data, instructs the generation of visual content that conforms to specified conditions.

[0095] A "device for verifying generated visuals" is a device that evaluates the quality and compliance with requirements of the generated visual content and makes corrections as necessary.

[0096] A "device that provides the final visuals" is a device that provides the modified visual content to the user and makes it available for use.

[0097] "Visuals that can be easily used in advertising campaigns and promotional activities" refers to images and graphics designed to effectively reach a specific target market.

[0098] The system implementing this invention consists of a user terminal, an information processing server, and a generating AI model. The user uses their own device, such as a smartphone or computer, to input the conditions and concepts of the visual content necessary for advertising campaigns and promotional activities. The input information is transmitted from the terminal to the information processing server.

[0099] The server receives this information and analyzes the specified conditions and concepts using natural language processing and image analysis techniques. The analysis is primarily performed using Python and TensorFlow. As a result of the analysis, data is generated to materialize the visuals requested by the user. Subsequently, the server uses this data to leverage generative AI models, particularly generative adversarial networks (GANs), to generate original visuals that meet the specified conditions.

[0100] The generated visuals are reviewed on the server to determine their quality and suitability to customer requirements. Corrections are made as needed at this stage. The final visuals are sent back from the information processing server to the user's terminal, where the user can review them and use them in advertising campaigns, etc.

[0101] As a concrete example, consider a scenario where a user enters a prompt such as, "I want an event poster that gives a futuristic and innovative impression." Based on this prompt, the generative AI model creates a visual combining vibrant colors and a novel design. This visual can then be immediately used as advertising material.

[0102] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0103] Step 1:

[0104] The user inputs the necessary conditions and concepts for the advertising visual as prompts on their device. An example of user input is the prompt, "I want an event poster that gives a futuristic and innovative impression." The input data is temporarily stored on the device.

[0105] Step 2:

[0106] The terminal sends a prompt message to the information processing server. By sending the prompt message as input to the server, the prompt message is sent for analysis. The server receives it and begins data analysis.

[0107] Step 3:

[0108] The server uses natural language processing techniques to analyze the prompt text. The input is the user's prompt text, and the output is the visual conditions and objectives requested by the user. The server utilizes TensorFlow to extract specific images and concepts from keywords and phrases.

[0109] Step 4:

[0110] The server requests image generation from the AI ​​model based on the extracted conditions. The input is the analyzed data, and the output is the generation request. Using a GAN, the generation of a new visual that satisfies the specified conditions begins.

[0111] Step 5:

[0112] The generative AI model generates visuals according to the given conditions. The input is an image generation request, and the original visual is returned to the server as output. The generated visual is temporarily stored on the server.

[0113] Step 6:

[0114] The server reviews the generated visuals and checks their quality and compliance with requirements. Data processing and adjustments are made as needed. The input is the generated visual, and the output is the final visual.

[0115] Step 7:

[0116] Finally, the server sends the completed visual to the terminal. The terminal receives the visual as output and displays it to the user. The user can then review it and use it for advertising production, etc.

[0117] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0118] This invention enables highly personalized image generation by incorporating an emotion engine that recognizes emotions when a user inputs conditions and images into an image generation system via a terminal. In addition to inputting normal text and images, the user's emotions are monitored by the emotion engine. For example, by using sensors such as a camera and microphone to analyze the user's facial expressions and tone of voice, emotions are evaluated in real time.

[0119] The device analyzes the user's emotions detected using an emotion engine, integrates the results into the input data, and then sends it to the server. The server analyzes the received data and clarifies the conditions corresponding to the user's intentions and emotions. Next, it inputs these analysis results into an AI model and requests it to generate images with a style and atmosphere that matches the user's emotions.

[0120] The generated images are customized according to the user's emotional state, and the generation process selects elements and colors that are likely to emotionally satisfy the user. After initial validation on the server, the emotion engine is used again to confirm that the generated image is appropriate for the user's emotions, and final adjustments are made if necessary.

[0121] The terminal ultimately displays the image sent from the server to the user, who can then review the image and use it for document creation, presentations, and other purposes. For example, if a user is seeking a relaxing cityscape but is also feeling stressed, the system will suggest a peaceful and calming landscape, providing an image that resonates with the user's emotions.

[0122] By incorporating an emotion engine in this way, it is possible to generate images that match the user's emotions, dramatically improving the overall effectiveness of the system and the user experience.

[0123] The following describes the processing flow.

[0124] Step 1:

[0125] The user inputs conditions and images related to the desired image into the device. At this time, the device's built-in emotion engine analyzes the user's facial expressions and voice in real time using the camera and microphone to recognize the user's emotions.

[0126] Step 2:

[0127] The device sends user input data and analysis results from the emotion engine to the server. This ensures that the user's intentions and current emotional state are communicated together.

[0128] Step 3:

[0129] The server analyzes the received data. Text data is processed through natural language processing and combined with image data and emotional states to clarify the user's requests.

[0130] Step 4:

[0131] Based on the analysis results, the server requests image generation from the AI ​​model. Here, image elements corresponding to emotions recognized by the emotion engine are taken into consideration. For example, if the user is seeking relaxation, calm colors and styles will be selected.

[0132] Step 5:

[0133] The AI ​​model generates images based on requests. The model creates images that reflect a specific style or theme and returns the results to the server.

[0134] Step 6:

[0135] The server reviews the generated image and uses the emotion engine to re-verify whether the image is appropriate for the user's emotions. If necessary, it fine-tunes the color scheme and composition to make final adjustments that match the user's emotions.

[0136] Step 7:

[0137] The server sends the final image to the terminal. The sent image is provided to the user in the most suitable format.

[0138] Step 8:

[0139] The device displays the image to the user. The user can review the generated image and, if deemed appropriate, download and use it. Through this process, the user gains access to original images that resonate with their emotions.

[0140] (Example 2)

[0141] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0142] In image generation technology, conventional systems failed to consider user emotions, making it difficult to provide images that truly matched the user's needs and feelings. As a result, users were dissatisfied with the generated images and lacked a personalized experience.

[0143] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0144] In this invention, the server includes means for analyzing the user's emotional state using sensors, means for integrating the received input data and emotional data and transmitting it to the server, and means for making an image generation request based on the analysis results. This makes it possible to generate personalized images that take the user's emotions into consideration.

[0145] A "user" is an individual or group that uses an image generation system to input conditions and images in order to obtain personalized results.

[0146] "Input data" refers to data containing conditions and information that the user provides to the system, and may be expressed as prompts.

[0147] "Emotional state" refers to the state of mind indicated by the user's facial expressions, tone of voice, etc., and is detected by sensors.

[0148] A "server" is a computer system that analyzes received data and manages the image generation process.

[0149] An "image generation request" is an instruction to generate an image in a specified style and atmosphere based on the user's conditions and emotions.

[0150] An "artificial intelligence model" is a model that uses machine learning techniques to generate images tailored to the user's needs.

[0151] "Integration" is the process of combining user input data and sentiment data and processing them as a single dataset.

[0152] This invention is a technology for users to generate personalized images using an image generation system. The user inputs prompt text, including conditions and images, using a terminal. For example, the user can input the prompt text, "I want to see a relaxing landscape."

[0153] The device uses sensors such as cameras and microphones to analyze the user's facial expressions and voice tone in order to understand the user's emotional state. This analysis is performed by an emotion engine, and the user's emotions are evaluated in real time.

[0154] User input data and emotional data are integrated and sent to the server. The server analyzes this data to identify conditions based on the user's intentions and emotions. This enables the generation of optimal images that correspond to the user's emotional state.

[0155] Specifically, the server uses a generative AI model to request image generation based on the analysis results. The AI ​​model receives instructions such as generating a landscape painting with a calm and gentle color scheme. This mechanism allows users to obtain images that resonate with their emotions.

[0156] The generated images are evaluated again on the server using an emotion engine. After appropriate adjustments are made, they are provided to the user via the terminal. This allows users to utilize images that they find highly satisfactory in document creation and everyday use.

[0157] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0158] Step 1:

[0159] The user inputs prompt messages into the image generation system via a terminal. For example, they might input a condition such as, "I want to see a relaxing landscape." This input becomes the basic data for processing in the system. As output, the prompt messages are ready to be sent to the server.

[0160] Step 2:

[0161] The device analyzes the user's emotional state using sensors such as cameras and microphones. During this process, emotional data is acquired from the user's facial expressions and tone of voice. An emotion engine is then used to evaluate the user's emotional state in real time. The output of this step is numerical data representing the user's emotional state.

[0162] Step 3:

[0163] The terminal integrates prompt text and sentiment data. This integrated data is sent to the server. Data integration combines text input and sentiment values ​​to generate a dataset that comprehensively represents the user's state. The output is the integrated data ready for transmission.

[0164] Step 4:

[0165] The server analyzes the received integrated data. This analysis identifies specific conditions for image generation based on the user's intent and emotions. Based on the input information, this data analysis outputs the optimal image generation conditions.

[0166] Step 5:

[0167] The server inputs the specified conditions into the generating AI model and requests image generation. The AI ​​model is programmed, for example, to generate a landscape with relaxed color tones. The output of this step is a generated image that conforms to the conditions.

[0168] Step 6:

[0169] The server performs an initial validation of the generated image. It uses the emotion engine again to evaluate whether the generated image matches the user's emotions. If necessary, it adjusts the image's color tone and elements. The output of this step is the image adjusted to the user's emotional state.

[0170] Step 7:

[0171] The terminal provides the user with the final image sent from the server. The user can view this image and use it for document creation and presentations. The output of this step is the final image that the user can visually confirm.

[0172] (Application Example 2)

[0173] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0174] In modern online shopping, there is a lack of personalization that takes into account the user's emotional state, and there is a particular need to improve the provision of information and product suggestions that respond to consumers' emotions. Therefore, it is necessary to establish effective methods that can improve the user experience and increase purchasing intent.

[0175] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0176] In this invention, the server includes means for receiving input from the user and understanding conditions and images, means for analyzing the user's emotions using an emotional state detection engine, and means for making an image generation request adapted to the emotions based on the analysis results. This makes it possible to provide personalized product information and images that are in line with the user's emotions.

[0177] A "user" is a consumer who uses a system to input conditions and images and receives services.

[0178] "Input data" refers to information such as conditions and images provided by the user, which is then processed by the system.

[0179] An "emotion engine" is a system component that analyzes the user's facial expressions and voice to evaluate their emotional state in real time.

[0180] A "server" is a computer system that operates on a central computer, receives input data and sentiment data, and performs analysis.

[0181] "Analysis means" refers to a method by which a server processes input data and sentiment data to determine requests based on the user's intentions and emotions.

[0182] An "image generation request" is an instruction to generate an image with a specific style and atmosphere based on the user's input data and analyzed emotions.

[0183] "Adjustment means" refers to the process of checking whether the generated image matches the user's emotions and modifying the image content as needed.

[0184] "Means of delivery" refers to the method by which the final generated customized image is presented to the user.

[0185] To realize this invention, the system analyzes user input data using an emotion engine and acquires the emotional state as data. Users use smartphones or other devices, and by using a camera and microphone as input devices, their emotions are detected in real time from their facial expressions and voice. The emotion engine identifies this emotional state by utilizing image analysis libraries and voice analysis libraries.

[0186] The terminal sends user-inputted conditions, images, and analyzed emotion data to the server. The server receives this data and uses software to perform analysis. On the server, based on the emotion data, a generative AI model is used to generate images with styles and atmospheres that correspond to the user's emotional state. The generated images undergo initial validation on the server and are adjusted as needed.

[0187] As a concrete example, if a user requests "relaxing cityscapes" while online shopping, and the system detects their stress, the server will use a generative AI model to suggest images with calm and soothing colors. An example of a prompt sent to the generative model would be, "Generate a promotional image for a product to display when the user is feeling relaxed. The image should have soft colors and a calming atmosphere." This allows the user to have a positive experience through the image. In this way, the invention can provide a service that adapts to the user's emotions, resulting in a richer user experience.

[0188] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0189] Step 1:

[0190] The user uses a device and inputs conditions and emotional states using the camera and microphone as input devices. The input data here consists of the user's facial image and voice data, which are sent to the emotion engine. The emotion engine analyzes this data and converts the user's emotional state into numerical emotional data.

[0191] Step 2:

[0192] The terminal sends the input conditions and images, along with the emotional data analyzed by the emotion engine, to the server. During this process, the data is securely encrypted using a communication protocol and reaches the server via the network.

[0193] Step 3:

[0194] The server analyzes the received input data and sentiment data to identify the user's intent. This process also utilizes contextual information from the database to determine which images are appropriate. The analysis results are then used to construct instructions for input into the generating AI model.

[0195] Step 4:

[0196] The server sends a prompt message to the AI ​​model based on the analysis results, requesting image generation. The prompt message reflects the user's emotional state and intentions, and the AI ​​model generates an image in the specified style and atmosphere.

[0197] Step 5:

[0198] The generated images undergo initial validation on the server to confirm that they match the user's emotional state. In this step, the image's color tone and other aspects are adjusted as needed to prepare it as the final image.

[0199] Step 6:

[0200] The server sends the final, adjusted image to the device. The device displays this image to the user, who then uses it to make purchases or other decisions. As a result, the user is provided with a personalized experience.

[0201] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0202] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0203] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0204] [Second Embodiment]

[0205] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0206] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0207] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0208] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0209] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0210] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0211] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0212] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0213] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0214] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0215] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0216] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0217] This invention begins with a user inputting specific conditions or images from a terminal. The user inputs desired keywords or image data via a keyboard or touchscreen. The terminal has the function of sending this input data to a server. The server analyzes the received data, analyzes the input keywords using natural language processing techniques, and identifies the user's specific requests. If image data is included, it analyzes the image using computer vision techniques and extracts relevant visual elements.

[0218] The server requests image generation from the AI ​​model based on the analysis results. While various image generation technologies can be used as the AI ​​model in this invention, GANs (Generative Adversarial Networks) and deep learning technologies are generally utilized. The AI ​​model generates an original image that reflects the user's conditions based on the request. This generated image is temporarily stored on the server and reviewed for quality and suitability to the request. The image is fine-tuned as needed.

[0219] Finally, the server sends the generated image back to the terminal. The terminal displays the received image to the user, allowing the user to review it. For example, if a user requests an image of a futuristic city and inputs related abstract elements, the AI ​​model will generate an imaginary city image using multiple architectural styles and color schemes, and provide it to the user. The user can then download this image and use it for presentations or document creation.

[0220] This invention offers the advantage of allowing users to quickly obtain original, copyright-free images and use them free of charge for creating materials.

[0221] The following describes the processing flow.

[0222] Step 1:

[0223] Users use their devices to input criteria and images related to their desired image. They can also enter keywords in text or upload reference images.

[0224] Step 2:

[0225] The terminal sends the data entered by the user to the server. The data is packaged in common data formats such as JSON or XML and sent to the server over the network.

[0226] Step 3:

[0227] The server analyzes the received data. This analysis includes natural language processing (NLP) of the text, extracting keywords and phrases. If images are uploaded, computer vision technology is used to extract image features.

[0228] Step 4:

[0229] The server requests image generation from the AI ​​model based on the analysis results. The server inputs the analysis results into the AI ​​model and instructs it to generate an image that specifically reflects the user's conditions and requests.

[0230] Step 5:

[0231] AI models create images based on generation requests. The model generates images that conform to a specified style and concept, and returns the results to the server. AI models typically utilize architectures such as GANs (Generative Adversarial Networks) or deep learning.

[0232] Step 6:

[0233] The server reviews the generated image and evaluates whether it meets the specified requirements. If necessary, it performs processing to adjust the image's color tone and details.

[0234] Step 7:

[0235] The server sends the finalized image data to the terminal. The sent image is converted to a user-friendly format (e.g., JPEG, PNG).

[0236] Step 8:

[0237] The device displays images to the user. The user reviews the generated images, downloads them as needed, and uses them for document creation and presentations.

[0238] (Example 1)

[0239] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0240] In today's information environment, the generation of creative visual data is becoming increasingly important, but users face the challenge of not being able to easily generate high-quality and original visual data. Furthermore, there is a need for a system that can accurately understand and quickly reflect the user's conditions and intentions regarding visual data generation.

[0241] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0242] In this invention, the server includes means for analyzing input data and identifying the user's intent, means for inputting an image generation request as a prompt message to a generation AI model based on the analysis results, and means for checking the generated visual data and adjusting it as necessary. This makes it possible to quickly generate and provide original visual data that matches the user's intent.

[0243] An "information processing device" is a device that receives user input, processes it, and transmits it to another device or system.

[0244] A "generative AI model" is a mathematical model that uses artificial intelligence technology to generate visual data based on specified conditions.

[0245] A "prompt message" is text data that indicates instructions or conditions to be input into a generative AI model.

[0246] "Visual data" refers to digital data that includes images and graphical information, and is generated based on specific conditions or requirements.

[0247] This invention begins with a user inputting specific conditions or images using an information processing device. The user inputs desired keywords or image data into the information processing device via a keyboard or touchscreen. The information processing device acquires this input data and transmits it to a server via a network.

[0248] The server analyzes the received data, uses natural language processing libraries (e.g., NLTK or SpaCy) to analyze keywords and identify the user's intent. If image data is included, computer vision technologies (e.g., OpenCV or TensorFlow) are used to analyze the image data and extract relevant visual elements.

[0249] Based on the analyzed information, the server requests image generation from the generative AI model. This request is provided to the generative AI model as a prompt based on the analysis results. The generative AI model is typically built using GAN (Generative Adversarial Network) technology running on PyTorch or TensorFlow. For example, prompts such as "futuristic city," "modern," and "technology" can be used.

[0250] The generated visual data is temporarily stored on a server and its quality and compliance with requirements are verified by a dedicated review algorithm. The generated visual data may be readjusted as needed.

[0251] Finally, the server sends the verified visual data to the information processing device. Users can then view the generated results on the information processing device's screen and download or use them for other purposes. This system offers users the advantage of quickly and efficiently obtaining original visual data and utilizing it for various purposes.

[0252] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0253] Step 1:

[0254] The user inputs desired keywords or images using the keyboard or touchscreen on the device. The input data is collected by the device and includes information contained in the prompt sentences requested by the generating AI model. The device then processes this data to send it to the server.

[0255] Step 2:

[0256] The device sends input data collected from the user to the server. The data sent includes keywords in text format and, if necessary, image data. The device generates an HTTP request and sends the data to the server API via this request.

[0257] Step 3:

[0258] The server receives data sent from the terminal. The server uses a natural language processing library (e.g., NLTK or SpaCy) to analyze the input keywords and identify the user's intent. Specifically, it tokenizes the keywords and extracts key concepts. The server's output is the analyzed text data.

[0259] Step 4:

[0260] When the server receives image data, it analyzes it using computer vision technologies (e.g., OpenCV or TensorFlow). The server extracts image features and identifies relevant visual elements. The output is data related to the extracted visual elements.

[0261] Step 5:

[0262] The server generates and inputs prompt sentences to the generation AI model based on the analysis results. The server constructs prompt sentences that are appropriate to the user's intent and passes them to the generation AI model. The generation AI model generates images based on this input.

[0263] Step 6:

[0264] The generative AI model generates visual data based on the prompt text. The technology typically used is a GAN (Generative Adversarial Network), and the generated images are original. The output at this stage is a file containing the generated visual data.

[0265] Step 7:

[0266] The server reviews the generated visual data to ensure it meets the criteria. Using a quality check algorithm, it evaluates whether the generated images meet the requirements and makes adjustments if necessary. The output is the adjusted visual data.

[0267] Step 8:

[0268] The server sends the final visual data back to the terminal. The server sends the visual data to the terminal, which displays it in a format that the user can view. The user can then review the final generated visual data and download or use it.

[0269] (Application Example 1)

[0270] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0271] In advertising, there is a need to quickly create original and compelling visual content tailored to the target market, but traditional methods are time-consuming and costly. Furthermore, it is difficult to concretize diverse images and concepts in a short time, necessitating solutions to more efficiently enhance advertising effectiveness.

[0272] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0273] In this invention, the server includes a device that receives instructions from the user and grasps the conditions and concepts, a device that transmits the received instruction data to an information processing device, and a device that analyzes the instruction data in the information processing device and identifies the user's purpose. This makes it possible to efficiently generate visuals that can be easily used for advertising campaigns and dissemination activities.

[0274] A "device that receives instructions from a user" is a device that receives conditions and conceptual information input by a user and provides an interface for initiating processing.

[0275] An "information processing device" is a device that analyzes data and performs calculations to identify the user's purpose and intentions.

[0276] A "visual generation request device" is a device that, based on analyzed data, instructs the generation of visual content that conforms to specified conditions.

[0277] A "device for verifying generated visuals" is a device that evaluates the quality and compliance with requirements of the generated visual content and makes corrections as necessary.

[0278] A "device that provides the final visuals" is a device that provides the modified visual content to the user and makes it available for use.

[0279] "Visuals that can be easily used in advertising campaigns and promotional activities" refers to images and graphics designed to effectively reach a specific target market.

[0280] The system implementing this invention consists of a user terminal, an information processing server, and a generating AI model. The user uses their own device, such as a smartphone or computer, to input the conditions and concepts of the visual content necessary for advertising campaigns and promotional activities. The input information is transmitted from the terminal to the information processing server.

[0281] The server receives this and analyzes it using natural language processing technology and image analysis technology according to the specified conditions and concepts. The analysis process is mainly carried out using Python and TensorFlow. As a result of the analysis, data for materializing the visual desired by the user is generated. After that, the server uses these data to utilize a generative AI model, especially an adversarial generative network (GAN), to generate an original visual that meets the conditions.

[0282] The generated visual is confirmed within the server to judge its quality and compliance with customer requirements. Modifications are made as necessary at this stage. The final visual is sent back from the information processing server to the user terminal, and the user can check this and utilize it for an advertising campaign or the like.

[0283] As a specific example, consider the case where a user inputs a prompt sentence such as "I want an event poster that gives a futuristic and innovative impression". Based on this prompt, the generative AI model creates a visual combining vivid colors and a novel design. This visual can be immediately utilized as advertising material.

[0284] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0285] Step 1:

[0286] The user inputs the conditions and concepts necessary for the advertising visual as a prompt sentence on the terminal. An example of the user's input is the prompt sentence "I want an event poster that gives a futuristic and innovative impression". The input data is temporarily stored in the terminal.

[0287] Step 2:

[0288] The terminal sends the prompt sentence to the information processing server. By sending the prompt sentence as an input to the server, the prompt sentence is subjected to analysis. The server receives this and starts data analysis.

[0289] Step 3:

[0290] The server uses natural language processing techniques to analyze the prompt text. The input is the user's prompt text, and the output is the visual conditions and objectives requested by the user. The server utilizes TensorFlow to extract specific images and concepts from keywords and phrases.

[0291] Step 4:

[0292] The server requests image generation from the AI ​​model based on the extracted conditions. The input is the analyzed data, and the output is the generation request. Using a GAN, the generation of a new visual that satisfies the specified conditions begins.

[0293] Step 5:

[0294] The generative AI model generates visuals according to the given conditions. The input is an image generation request, and the original visual is returned to the server as output. The generated visual is temporarily stored on the server.

[0295] Step 6:

[0296] The server reviews the generated visuals and checks their quality and compliance with requirements. Data processing and adjustments are made as needed. The input is the generated visual, and the output is the final visual.

[0297] Step 7:

[0298] Finally, the server sends the completed visual to the terminal. The terminal receives the visual as output and displays it to the user. The user can then review it and use it for advertising production, etc.

[0299] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0300] This invention enables highly personalized image generation by incorporating an emotion engine that recognizes emotions when a user inputs conditions and images into an image generation system via a terminal. In addition to inputting normal text and images, the user's emotions are monitored by the emotion engine. For example, by using sensors such as a camera and microphone to analyze the user's facial expressions and tone of voice, emotions are evaluated in real time.

[0301] The device analyzes the user's emotions detected using an emotion engine, integrates the results into the input data, and then sends it to the server. The server analyzes the received data and clarifies the conditions corresponding to the user's intentions and emotions. Next, it inputs these analysis results into an AI model and requests it to generate images with a style and atmosphere that matches the user's emotions.

[0302] The generated images are customized according to the user's emotional state, and the generation process selects elements and colors that are likely to emotionally satisfy the user. After initial validation on the server, the emotion engine is used again to confirm that the generated image is appropriate for the user's emotions, and final adjustments are made if necessary.

[0303] The terminal ultimately displays the image sent from the server to the user, who can then review the image and use it for document creation, presentations, and other purposes. For example, if a user is seeking a relaxing cityscape but is also feeling stressed, the system will suggest a peaceful and calming landscape, providing an image that resonates with the user's emotions.

[0304] By incorporating the emotion engine in this way, it is possible to generate images that match the user's emotions, thereby significantly improving the effectiveness of the entire system and the user experience.

[0305] The following describes the processing flow.

[0306] Step 1:

[0307] The user inputs conditions and images related to the desired image to the terminal. At this time, the emotion engine installed on the terminal analyzes the user's facial expressions and voice in real time using the camera and microphone, and recognizes the user's emotions.

[0308] Step 2:

[0309] The terminal sends the user's input data and the analysis results by the emotion engine to the server. As a result, the user's intention and the current emotional state are transmitted together.

[0310] Step 3:

[0311] The server analyzes the received data. The text data is processed through natural language processing and combined with the image data and emotional state to clarify the user's request.

[0312] Step 4:

[0313] Based on the analyzed results, the server makes an image generation request to the AI model. Here, in particular, the image elements corresponding to the emotions recognized by the emotion engine are taken into consideration. For example, if the user's emotion is to seek relaxation, a gentle color tone and style will be selected.

[0314] Step 5:

[0315] The AI model generates an image based on the request. The model creates an image that reflects a specific style and theme and returns the result to the server.

[0316] Step 6:

[0317] The server reviews the generated image and uses the emotion engine to re-verify whether the image is appropriate for the user's emotions. If necessary, it fine-tunes the color scheme and composition to make final adjustments that match the user's emotions.

[0318] Step 7:

[0319] The server sends the final image to the terminal. The sent image is provided to the user in the most suitable format.

[0320] Step 8:

[0321] The device displays the image to the user. The user can review the generated image and, if deemed appropriate, download and use it. Through this process, the user gains access to original images that resonate with their emotions.

[0322] (Example 2)

[0323] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0324] In image generation technology, conventional systems failed to consider user emotions, making it difficult to provide images that truly matched the user's needs and feelings. As a result, users were dissatisfied with the generated images and lacked a personalized experience.

[0325] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0326] In this invention, the server includes means for analyzing the user's emotional state using sensors, means for integrating the received input data and emotional data and transmitting it to the server, and means for making an image generation request based on the analysis results. This makes it possible to generate personalized images that take the user's emotions into consideration.

[0327] A "user" is an individual or group that uses an image generation system to input conditions and images in order to obtain personalized results.

[0328] "Input data" refers to data containing conditions and information that the user provides to the system, and may be expressed as prompts.

[0329] "Emotional state" refers to the state of mind indicated by the user's facial expressions, tone of voice, etc., and is detected by sensors.

[0330] A "server" is a computer system that analyzes received data and manages the image generation process.

[0331] An "image generation request" is an instruction to generate an image in a specified style and atmosphere based on the user's conditions and emotions.

[0332] An "artificial intelligence model" is a model that uses machine learning techniques to generate images tailored to the user's needs.

[0333] "Integration" is the process of combining user input data and sentiment data and processing them as a single dataset.

[0334] This invention is a technology for users to generate personalized images using an image generation system. The user inputs prompt text, including conditions and images, using a terminal. For example, the user can input the prompt text, "I want to see a relaxing landscape."

[0335] The device uses sensors such as cameras and microphones to analyze the user's facial expressions and voice tone in order to understand the user's emotional state. This analysis is performed by an emotion engine, and the user's emotions are evaluated in real time.

[0336] User input data and emotional data are integrated and sent to the server. The server analyzes this data to identify conditions based on the user's intentions and emotions. This enables the generation of optimal images that correspond to the user's emotional state.

[0337] Specifically, the server uses a generative AI model to request image generation based on the analysis results. The AI ​​model receives instructions such as generating a landscape painting with a calm and gentle color scheme. This mechanism allows users to obtain images that resonate with their emotions.

[0338] The generated images are evaluated again on the server using an emotion engine. After appropriate adjustments are made, they are provided to the user via the terminal. This allows users to utilize images that they find highly satisfactory in document creation and everyday use.

[0339] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0340] Step 1:

[0341] The user inputs prompt messages into the image generation system via a terminal. For example, they might input a condition such as, "I want to see a relaxing landscape." This input becomes the basic data for processing in the system. As output, the prompt messages are ready to be sent to the server.

[0342] Step 2:

[0343] The device analyzes the user's emotional state using sensors such as cameras and microphones. During this process, emotional data is acquired from the user's facial expressions and tone of voice. An emotion engine is then used to evaluate the user's emotional state in real time. The output of this step is numerical data representing the user's emotional state.

[0344] Step 3:

[0345] The terminal integrates prompt text and sentiment data. This integrated data is sent to the server. Data integration combines text input and sentiment values ​​to generate a dataset that comprehensively represents the user's state. The output is the integrated data ready for transmission.

[0346] Step 4:

[0347] The server analyzes the received integrated data. This analysis identifies specific conditions for image generation based on the user's intent and emotions. Based on the input information, this data analysis outputs the optimal image generation conditions.

[0348] Step 5:

[0349] The server inputs the specified conditions into the generating AI model and requests image generation. The AI ​​model is programmed, for example, to generate a landscape with relaxed color tones. The output of this step is a generated image that conforms to the conditions.

[0350] Step 6:

[0351] The server performs an initial validation of the generated image. It uses the emotion engine again to evaluate whether the generated image matches the user's emotions. If necessary, it adjusts the image's color tone and elements. The output of this step is the image adjusted to the user's emotional state.

[0352] Step 7:

[0353] The terminal provides the user with the final image sent from the server. The user can view this image and use it for document creation and presentations. The output of this step is the final image that the user can visually confirm.

[0354] (Application Example 2)

[0355] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0356] In modern online shopping, there is a lack of personalization that takes into account the user's emotional state, and there is a particular need to improve the provision of information and product suggestions that respond to consumers' emotions. Therefore, it is necessary to establish effective methods that can improve the user experience and increase purchasing intent.

[0357] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0358] In this invention, the server includes means for receiving input from the user and understanding conditions and images, means for analyzing the user's emotions using an emotional state detection engine, and means for making an image generation request adapted to the emotions based on the analysis results. This makes it possible to provide personalized product information and images that are in line with the user's emotions.

[0359] A "user" is a consumer who uses a system to input conditions and images and receives services.

[0360] "Input data" refers to information such as conditions and images provided by the user, which is then processed by the system.

[0361] An "emotion engine" is a system component that analyzes the user's facial expressions and voice to evaluate their emotional state in real time.

[0362] A "server" is a computer system that operates on a central computer, receives input data and sentiment data, and performs analysis.

[0363] "Analysis means" refers to a method by which a server processes input data and sentiment data to determine requests based on the user's intentions and emotions.

[0364] An "image generation request" is an instruction to generate an image with a specific style and atmosphere based on the user's input data and analyzed emotions.

[0365] "Adjustment means" refers to the process of checking whether the generated image matches the user's emotions and modifying the image content as needed.

[0366] "Means of delivery" refers to the method by which the final generated customized image is presented to the user.

[0367] To realize this invention, the system analyzes user input data using an emotion engine and acquires the emotional state as data. Users use smartphones or other devices, and by using a camera and microphone as input devices, their emotions are detected in real time from their facial expressions and voice. The emotion engine identifies this emotional state by utilizing image analysis libraries and voice analysis libraries.

[0368] The terminal sends user-inputted conditions, images, and analyzed emotion data to the server. The server receives this data and uses software to perform analysis. On the server, based on the emotion data, a generative AI model is used to generate images with styles and atmospheres that correspond to the user's emotional state. The generated images undergo initial validation on the server and are adjusted as needed.

[0369] As a concrete example, if a user requests "relaxing cityscapes" while online shopping, and the system detects their stress, the server will use a generative AI model to suggest images with calm and soothing colors. An example of a prompt sent to the generative model would be, "Generate a promotional image for a product to display when the user is feeling relaxed. The image should have soft colors and a calming atmosphere." This allows the user to have a positive experience through the image. In this way, the invention can provide a service that adapts to the user's emotions, resulting in a richer user experience.

[0370] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0371] Step 1:

[0372] The user uses a device and inputs conditions and emotional states using the camera and microphone as input devices. The input data here consists of the user's facial image and voice data, which are sent to the emotion engine. The emotion engine analyzes this data and converts the user's emotional state into numerical emotional data.

[0373] Step 2:

[0374] The terminal sends the input conditions and images, along with the emotional data analyzed by the emotion engine, to the server. During this process, the data is securely encrypted using a communication protocol and reaches the server via the network.

[0375] Step 3:

[0376] The server analyzes the received input data and sentiment data to identify the user's intent. This process also utilizes contextual information from the database to determine which images are appropriate. The analysis results are then used to construct instructions for input into the generating AI model.

[0377] Step 4:

[0378] The server sends a prompt message to the AI ​​model based on the analysis results, requesting image generation. The prompt message reflects the user's emotional state and intentions, and the AI ​​model generates an image in the specified style and atmosphere.

[0379] Step 5:

[0380] The generated images undergo initial validation on the server to confirm that they match the user's emotional state. In this step, the image's color tone and other aspects are adjusted as needed to prepare it as the final image.

[0381] Step 6:

[0382] The server sends the final, adjusted image to the device. The device displays this image to the user, who then uses it to make purchases or other decisions. As a result, the user is provided with a personalized experience.

[0383] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0384] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0385] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0386] [Third Embodiment]

[0387] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0388] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0389] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0390] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0391] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0392] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0393] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0394] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0395] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0396] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0397] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0398] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0399] This invention begins with a user inputting specific conditions or images from a terminal. The user inputs desired keywords or image data via a keyboard or touchscreen. The terminal has the function of sending this input data to a server. The server analyzes the received data, analyzes the input keywords using natural language processing techniques, and identifies the user's specific requests. If image data is included, it analyzes the image using computer vision techniques and extracts relevant visual elements.

[0400] The server requests image generation from the AI ​​model based on the analysis results. While various image generation technologies can be used as the AI ​​model in this invention, GANs (Generative Adversarial Networks) and deep learning technologies are generally utilized. The AI ​​model generates an original image that reflects the user's conditions based on the request. This generated image is temporarily stored on the server and reviewed for quality and suitability to the request. The image is fine-tuned as needed.

[0401] Finally, the server sends the generated image back to the terminal. The terminal displays the received image to the user, allowing the user to review it. For example, if a user requests an image of a futuristic city and inputs related abstract elements, the AI ​​model will generate an imaginary city image using multiple architectural styles and color schemes, and provide it to the user. The user can then download this image and use it for presentations or document creation.

[0402] This invention offers the advantage of allowing users to quickly obtain original, copyright-free images and use them free of charge for creating materials.

[0403] The following describes the processing flow.

[0404] Step 1:

[0405] Users use their devices to input criteria and images related to their desired image. They can also enter keywords in text or upload reference images.

[0406] Step 2:

[0407] The terminal sends the data entered by the user to the server. The data is packaged in common data formats such as JSON or XML and sent to the server over the network.

[0408] Step 3:

[0409] The server analyzes the received data. This analysis includes natural language processing (NLP) of the text, extracting keywords and phrases. If images are uploaded, computer vision technology is used to extract image features.

[0410] Step 4:

[0411] The server requests image generation from the AI ​​model based on the analysis results. The server inputs the analysis results into the AI ​​model and instructs it to generate an image that specifically reflects the user's conditions and requests.

[0412] Step 5:

[0413] AI models create images based on generation requests. The model generates images that conform to a specified style and concept, and returns the results to the server. AI models typically utilize architectures such as GANs (Generative Adversarial Networks) or deep learning.

[0414] Step 6:

[0415] The server reviews the generated image and evaluates whether it meets the specified requirements. If necessary, it performs processing to adjust the image's color tone and details.

[0416] Step 7:

[0417] The server sends the finalized image data to the terminal. The sent image is converted to a user-friendly format (e.g., JPEG, PNG).

[0418] Step 8:

[0419] The device displays images to the user. The user reviews the generated images, downloads them as needed, and uses them for document creation and presentations.

[0420] (Example 1)

[0421] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0422] In today's information environment, the generation of creative visual data is becoming increasingly important, but users face the challenge of not being able to easily generate high-quality and original visual data. Furthermore, there is a need for a system that can accurately understand and quickly reflect the user's conditions and intentions regarding visual data generation.

[0423] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0424] In this invention, the server includes means for analyzing input data and identifying the user's intent, means for inputting an image generation request as a prompt message to a generation AI model based on the analysis results, and means for checking the generated visual data and adjusting it as necessary. This makes it possible to quickly generate and provide original visual data that matches the user's intent.

[0425] An "information processing device" is a device that receives user input, processes it, and transmits it to another device or system.

[0426] A "generative AI model" is a mathematical model that uses artificial intelligence technology to generate visual data based on specified conditions.

[0427] A "prompt message" is text data that indicates instructions or conditions to be input into a generative AI model.

[0428] "Visual data" refers to digital data that includes images and graphical information, and is generated based on specific conditions or requirements.

[0429] This invention begins with a user inputting specific conditions or images using an information processing device. The user inputs desired keywords or image data into the information processing device via a keyboard or touchscreen. The information processing device acquires this input data and transmits it to a server via a network.

[0430] The server analyzes the received data, uses natural language processing libraries (e.g., NLTK or SpaCy) to analyze keywords and identify the user's intent. If image data is included, computer vision technologies (e.g., OpenCV or TensorFlow) are used to analyze the image data and extract relevant visual elements.

[0431] Based on the analyzed information, the server requests image generation from the generative AI model. This request is provided to the generative AI model as a prompt based on the analysis results. The generative AI model is typically built using GAN (Generative Adversarial Network) technology running on PyTorch or TensorFlow. For example, prompts such as "futuristic city," "modern," and "technology" can be used.

[0432] The generated visual data is temporarily stored on a server and its quality and compliance with requirements are verified by a dedicated review algorithm. The generated visual data may be readjusted as needed.

[0433] Finally, the server sends the verified visual data to the information processing device. Users can then view the generated results on the information processing device's screen and download or use them for other purposes. This system offers users the advantage of quickly and efficiently obtaining original visual data and utilizing it for various purposes.

[0434] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0435] Step 1:

[0436] The user inputs desired keywords or images using the keyboard or touchscreen on the device. The input data is collected by the device and includes information contained in the prompt sentences requested by the generating AI model. The device then processes this data to send it to the server.

[0437] Step 2:

[0438] The device sends input data collected from the user to the server. The data sent includes keywords in text format and, if necessary, image data. The device generates an HTTP request and sends the data to the server API via this request.

[0439] Step 3:

[0440] The server receives data sent from the terminal. The server uses a natural language processing library (e.g., NLTK or SpaCy) to analyze the input keywords and identify the user's intent. Specifically, it tokenizes the keywords and extracts key concepts. The server's output is the analyzed text data.

[0441] Step 4:

[0442] When the server receives image data, it analyzes it using computer vision technologies (e.g., OpenCV or TensorFlow). The server extracts image features and identifies relevant visual elements. The output is data related to the extracted visual elements.

[0443] Step 5:

[0444] The server generates and inputs prompt sentences to the generation AI model based on the analysis results. The server constructs prompt sentences that are appropriate to the user's intent and passes them to the generation AI model. The generation AI model generates images based on this input.

[0445] Step 6:

[0446] The generative AI model generates visual data based on the prompt text. The technology typically used is a GAN (Generative Adversarial Network), and the generated images are original. The output at this stage is a file containing the generated visual data.

[0447] Step 7:

[0448] The server reviews the generated visual data to ensure it meets the criteria. Using a quality check algorithm, it evaluates whether the generated images meet the requirements and makes adjustments if necessary. The output is the adjusted visual data.

[0449] Step 8:

[0450] The server sends the final visual data back to the terminal. The server sends the visual data to the terminal, which displays it in a format that the user can view. The user can then review the final generated visual data and download or use it.

[0451] (Application Example 1)

[0452] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0453] In advertising, there is a need to quickly create original and compelling visual content tailored to the target market, but traditional methods are time-consuming and costly. Furthermore, it is difficult to concretize diverse images and concepts in a short time, necessitating solutions to more efficiently enhance advertising effectiveness.

[0454] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0455] In this invention, the server includes a device that receives instructions from the user and grasps the conditions and concepts, a device that transmits the received instruction data to an information processing device, and a device that analyzes the instruction data in the information processing device and identifies the user's purpose. This makes it possible to efficiently generate visuals that can be easily used for advertising campaigns and dissemination activities.

[0456] A "device that receives instructions from a user" is a device that receives conditions and conceptual information input by a user and provides an interface for initiating processing.

[0457] An "information processing device" is a device that analyzes data and performs calculations to identify the user's purpose and intentions.

[0458] A "visual generation request device" is a device that, based on analyzed data, instructs the generation of visual content that conforms to specified conditions.

[0459] A "device for verifying generated visuals" is a device that evaluates the quality and compliance with requirements of the generated visual content and makes corrections as necessary.

[0460] A "device that provides the final visuals" is a device that provides the modified visual content to the user and makes it available for use.

[0461] "Visuals that can be easily used in advertising campaigns and promotional activities" refers to images and graphics designed to effectively reach a specific target market.

[0462] The system implementing this invention consists of a user terminal, an information processing server, and a generating AI model. The user uses their own device, such as a smartphone or computer, to input the conditions and concepts of the visual content necessary for advertising campaigns and promotional activities. The input information is transmitted from the terminal to the information processing server.

[0463] The server receives this information and analyzes the specified conditions and concepts using natural language processing and image analysis techniques. The analysis is primarily performed using Python and TensorFlow. As a result of the analysis, data is generated to materialize the visuals requested by the user. Subsequently, the server uses this data to leverage generative AI models, particularly generative adversarial networks (GANs), to generate original visuals that meet the specified conditions.

[0464] The generated visuals are reviewed on the server to determine their quality and suitability to customer requirements. Corrections are made as needed at this stage. The final visuals are sent back from the information processing server to the user's terminal, where the user can review them and use them in advertising campaigns, etc.

[0465] As a concrete example, consider a scenario where a user enters a prompt such as, "I want an event poster that gives a futuristic and innovative impression." Based on this prompt, the generative AI model creates a visual combining vibrant colors and a novel design. This visual can then be immediately used as advertising material.

[0466] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0467] Step 1:

[0468] The user inputs the necessary conditions and concepts for the advertising visual as prompts on their device. An example of user input is the prompt, "I want an event poster that gives a futuristic and innovative impression." The input data is temporarily stored on the device.

[0469] Step 2:

[0470] The terminal sends a prompt message to the information processing server. By sending the prompt message as input to the server, the prompt message is sent for analysis. The server receives it and begins data analysis.

[0471] Step 3:

[0472] The server uses natural language processing techniques to analyze the prompt text. The input is the user's prompt text, and the output is the visual conditions and objectives requested by the user. The server utilizes TensorFlow to extract specific images and concepts from keywords and phrases.

[0473] Step 4:

[0474] The server requests image generation from the AI ​​model based on the extracted conditions. The input is the analyzed data, and the output is the generation request. Using a GAN, the generation of a new visual that satisfies the specified conditions begins.

[0475] Step 5:

[0476] The generative AI model generates visuals according to the given conditions. The input is an image generation request, and the original visual is returned to the server as output. The generated visual is temporarily stored on the server.

[0477] Step 6:

[0478] The server reviews the generated visuals and checks their quality and compliance with requirements. Data processing and adjustments are made as needed. The input is the generated visual, and the output is the final visual.

[0479] Step 7:

[0480] Finally, the server sends the completed visual to the terminal. The terminal receives the visual as output and displays it to the user. The user can then review it and use it for advertising production, etc.

[0481] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0482] This invention enables highly personalized image generation by incorporating an emotion engine that recognizes emotions when a user inputs conditions and images into an image generation system via a terminal. In addition to inputting normal text and images, the user's emotions are monitored by the emotion engine. For example, by using sensors such as a camera and microphone to analyze the user's facial expressions and tone of voice, emotions are evaluated in real time.

[0483] The device analyzes the user's emotions detected using an emotion engine, integrates the results into the input data, and then sends it to the server. The server analyzes the received data and clarifies the conditions corresponding to the user's intentions and emotions. Next, it inputs these analysis results into an AI model and requests it to generate images with a style and atmosphere that matches the user's emotions.

[0484] The generated images are customized according to the user's emotional state, and the generation process selects elements and colors that are likely to emotionally satisfy the user. After initial validation on the server, the emotion engine is used again to confirm that the generated image is appropriate for the user's emotions, and final adjustments are made if necessary.

[0485] The terminal ultimately displays the image sent from the server to the user, who can then review the image and use it for document creation, presentations, and other purposes. For example, if a user is seeking a relaxing cityscape but is also feeling stressed, the system will suggest a peaceful and calming landscape, providing an image that resonates with the user's emotions.

[0486] By incorporating an emotion engine in this way, it is possible to generate images that match the user's emotions, dramatically improving the overall effectiveness of the system and the user experience.

[0487] The following describes the processing flow.

[0488] Step 1:

[0489] The user inputs conditions and images related to the desired image into the device. At this time, the device's built-in emotion engine analyzes the user's facial expressions and voice in real time using the camera and microphone to recognize the user's emotions.

[0490] Step 2:

[0491] The device sends user input data and analysis results from the emotion engine to the server. This ensures that the user's intentions and current emotional state are communicated together.

[0492] Step 3:

[0493] The server analyzes the received data. Text data is processed through natural language processing and combined with image data and emotional states to clarify the user's requests.

[0494] Step 4:

[0495] Based on the analysis results, the server requests image generation from the AI ​​model. Here, image elements corresponding to emotions recognized by the emotion engine are taken into consideration. For example, if the user is seeking relaxation, calm colors and styles will be selected.

[0496] Step 5:

[0497] The AI ​​model generates images based on requests. The model creates images that reflect a specific style or theme and returns the results to the server.

[0498] Step 6:

[0499] The server reviews the generated image and uses the emotion engine to re-verify whether the image is appropriate for the user's emotions. If necessary, it fine-tunes the color scheme and composition to make final adjustments that match the user's emotions.

[0500] Step 7:

[0501] The server sends the final image to the terminal. The sent image is provided to the user in the most suitable format.

[0502] Step 8:

[0503] The device displays the image to the user. The user can review the generated image and, if deemed appropriate, download and use it. Through this process, the user gains access to original images that resonate with their emotions.

[0504] (Example 2)

[0505] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0506] In image generation technology, conventional systems failed to consider user emotions, making it difficult to provide images that truly matched the user's needs and feelings. As a result, users were dissatisfied with the generated images and lacked a personalized experience.

[0507] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0508] In this invention, the server includes means for analyzing the user's emotional state using sensors, means for integrating the received input data and emotional data and transmitting it to the server, and means for making an image generation request based on the analysis results. This makes it possible to generate personalized images that take the user's emotions into consideration.

[0509] A "user" is an individual or group that uses an image generation system to input conditions and images in order to obtain personalized results.

[0510] "Input data" refers to data containing conditions and information that the user provides to the system, and may be expressed as prompts.

[0511] "Emotional state" refers to the state of mind indicated by the user's facial expressions, tone of voice, etc., and is detected by sensors.

[0512] A "server" is a computer system that analyzes received data and manages the image generation process.

[0513] An "image generation request" is an instruction to generate an image in a specified style and atmosphere based on the user's conditions and emotions.

[0514] An "artificial intelligence model" is a model that uses machine learning techniques to generate images tailored to the user's needs.

[0515] "Integration" is the process of combining user input data and sentiment data and processing them as a single dataset.

[0516] This invention is a technology for users to generate personalized images using an image generation system. The user inputs prompt text, including conditions and images, using a terminal. For example, the user can input the prompt text, "I want to see a relaxing landscape."

[0517] The device uses sensors such as cameras and microphones to analyze the user's facial expressions and voice tone in order to understand the user's emotional state. This analysis is performed by an emotion engine, and the user's emotions are evaluated in real time.

[0518] User input data and emotional data are integrated and sent to the server. The server analyzes this data to identify conditions based on the user's intentions and emotions. This enables the generation of optimal images that correspond to the user's emotional state.

[0519] Specifically, the server uses a generative AI model to request image generation based on the analysis results. The AI ​​model receives instructions such as generating a landscape painting with a calm and gentle color scheme. This mechanism allows users to obtain images that resonate with their emotions.

[0520] The generated images are evaluated again on the server using an emotion engine. After appropriate adjustments are made, they are provided to the user via the terminal. This allows users to utilize images that they find highly satisfactory in document creation and everyday use.

[0521] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0522] Step 1:

[0523] The user inputs prompt messages into the image generation system via a terminal. For example, they might input a condition such as, "I want to see a relaxing landscape." This input becomes the basic data for processing in the system. As output, the prompt messages are ready to be sent to the server.

[0524] Step 2:

[0525] The device analyzes the user's emotional state using sensors such as cameras and microphones. During this process, emotional data is acquired from the user's facial expressions and tone of voice. An emotion engine is then used to evaluate the user's emotional state in real time. The output of this step is numerical data representing the user's emotional state.

[0526] Step 3:

[0527] The terminal integrates prompt text and sentiment data. This integrated data is sent to the server. Data integration combines text input and sentiment values ​​to generate a dataset that comprehensively represents the user's state. The output is the integrated data ready for transmission.

[0528] Step 4:

[0529] The server analyzes the received integrated data. This analysis identifies specific conditions for image generation based on the user's intent and emotions. Based on the input information, this data analysis outputs the optimal image generation conditions.

[0530] Step 5:

[0531] The server inputs the specified conditions into the generating AI model and requests image generation. The AI ​​model is programmed, for example, to generate a landscape with relaxed color tones. The output of this step is a generated image that conforms to the conditions.

[0532] Step 6:

[0533] The server performs an initial validation of the generated image. It uses the emotion engine again to evaluate whether the generated image matches the user's emotions. If necessary, it adjusts the image's color tone and elements. The output of this step is the image adjusted to the user's emotional state.

[0534] Step 7:

[0535] The terminal provides the user with the final image sent from the server. The user can view this image and use it for document creation and presentations. The output of this step is the final image that the user can visually confirm.

[0536] (Application Example 2)

[0537] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0538] In modern online shopping, there is a lack of personalization that takes into account the user's emotional state, and there is a particular need to improve the provision of information and product suggestions that respond to consumers' emotions. Therefore, it is necessary to establish effective methods that can improve the user experience and increase purchasing intent.

[0539] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0540] In this invention, the server includes means for receiving input from the user and understanding conditions and images, means for analyzing the user's emotions using an emotional state detection engine, and means for making an image generation request adapted to the emotions based on the analysis results. This makes it possible to provide personalized product information and images that are in line with the user's emotions.

[0541] A "user" is a consumer who uses a system to input conditions and images and receives services.

[0542] "Input data" refers to information such as conditions and images provided by the user, which is then processed by the system.

[0543] An "emotion engine" is a system component that analyzes the user's facial expressions and voice to evaluate their emotional state in real time.

[0544] A "server" is a computer system that operates on a central computer, receives input data and sentiment data, and performs analysis.

[0545] "Analysis means" refers to a method by which a server processes input data and sentiment data to determine requests based on the user's intentions and emotions.

[0546] An "image generation request" is an instruction to generate an image with a specific style and atmosphere based on the user's input data and analyzed emotions.

[0547] "Adjustment means" refers to the process of checking whether the generated image matches the user's emotions and modifying the image content as needed.

[0548] "Means of delivery" refers to the method by which the final generated customized image is presented to the user.

[0549] To realize this invention, the system analyzes user input data using an emotion engine and acquires the emotional state as data. Users use smartphones or other devices, and by using a camera and microphone as input devices, their emotions are detected in real time from their facial expressions and voice. The emotion engine identifies this emotional state by utilizing image analysis libraries and voice analysis libraries.

[0550] The terminal sends user-inputted conditions, images, and analyzed emotion data to the server. The server receives this data and uses software to perform analysis. On the server, based on the emotion data, a generative AI model is used to generate images with styles and atmospheres that correspond to the user's emotional state. The generated images undergo initial validation on the server and are adjusted as needed.

[0551] As a concrete example, if a user requests "relaxing cityscapes" while online shopping, and the system detects their stress, the server will use a generative AI model to suggest images with calm and soothing colors. An example of a prompt sent to the generative model would be, "Generate a promotional image for a product to display when the user is feeling relaxed. The image should have soft colors and a calming atmosphere." This allows the user to have a positive experience through the image. In this way, the invention can provide a service that adapts to the user's emotions, resulting in a richer user experience.

[0552] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0553] Step 1:

[0554] The user uses a device and inputs conditions and emotional states using the camera and microphone as input devices. The input data here consists of the user's facial image and voice data, which are sent to the emotion engine. The emotion engine analyzes this data and converts the user's emotional state into numerical emotional data.

[0555] Step 2:

[0556] The terminal sends the input conditions and images, along with the emotional data analyzed by the emotion engine, to the server. During this process, the data is securely encrypted using a communication protocol and reaches the server via the network.

[0557] Step 3:

[0558] The server analyzes the received input data and sentiment data to identify the user's intent. This process also utilizes contextual information from the database to determine which images are appropriate. The analysis results are then used to construct instructions for input into the generating AI model.

[0559] Step 4:

[0560] The server sends a prompt message to the AI ​​model based on the analysis results, requesting image generation. The prompt message reflects the user's emotional state and intentions, and the AI ​​model generates an image in the specified style and atmosphere.

[0561] Step 5:

[0562] The generated images undergo initial validation on the server to confirm that they match the user's emotional state. In this step, the image's color tone and other aspects are adjusted as needed to prepare it as the final image.

[0563] Step 6:

[0564] The server sends the final, adjusted image to the device. The device displays this image to the user, who then uses it to make purchases or other decisions. As a result, the user is provided with a personalized experience.

[0565] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0566] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0567] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0568] [Fourth Embodiment]

[0569] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0570] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0571] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0572] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0573] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0574] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0575] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0576] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0577] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0578] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0579] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0580] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0581] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0582] This invention begins with a user inputting specific conditions or images from a terminal. The user inputs desired keywords or image data via a keyboard or touchscreen. The terminal has the function of sending this input data to a server. The server analyzes the received data, analyzes the input keywords using natural language processing techniques, and identifies the user's specific requests. If image data is included, it analyzes the image using computer vision techniques and extracts relevant visual elements.

[0583] The server requests image generation from the AI ​​model based on the analysis results. While various image generation technologies can be used as the AI ​​model in this invention, GANs (Generative Adversarial Networks) and deep learning technologies are generally utilized. The AI ​​model generates an original image that reflects the user's conditions based on the request. This generated image is temporarily stored on the server and reviewed for quality and suitability to the request. The image is fine-tuned as needed.

[0584] Finally, the server sends the generated image back to the terminal. The terminal displays the received image to the user, allowing the user to review it. For example, if a user requests an image of a futuristic city and inputs related abstract elements, the AI ​​model will generate an imaginary city image using multiple architectural styles and color schemes, and provide it to the user. The user can then download this image and use it for presentations or document creation.

[0585] This invention offers the advantage of allowing users to quickly obtain original, copyright-free images and use them free of charge for creating materials.

[0586] The following describes the processing flow.

[0587] Step 1:

[0588] Users use their devices to input criteria and images related to their desired image. They can also enter keywords in text or upload reference images.

[0589] Step 2:

[0590] The terminal sends the data entered by the user to the server. The data is packaged in common data formats such as JSON or XML and sent to the server over the network.

[0591] Step 3:

[0592] The server analyzes the received data. This analysis includes natural language processing (NLP) of the text, extracting keywords and phrases. If images are uploaded, computer vision technology is used to extract image features.

[0593] Step 4:

[0594] The server requests image generation from the AI ​​model based on the analysis results. The server inputs the analysis results into the AI ​​model and instructs it to generate an image that specifically reflects the user's conditions and requests.

[0595] Step 5:

[0596] AI models create images based on generation requests. The model generates images that conform to a specified style and concept, and returns the results to the server. AI models typically utilize architectures such as GANs (Generative Adversarial Networks) or deep learning.

[0597] Step 6:

[0598] The server reviews the generated image and evaluates whether it meets the specified requirements. If necessary, it performs processing to adjust the image's color tone and details.

[0599] Step 7:

[0600] The server sends the finalized image data to the terminal. The sent image is converted to a user-friendly format (e.g., JPEG, PNG).

[0601] Step 8:

[0602] The device displays images to the user. The user reviews the generated images, downloads them as needed, and uses them for document creation and presentations.

[0603] (Example 1)

[0604] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0605] In today's information environment, the generation of creative visual data is becoming increasingly important, but users face the challenge of not being able to easily generate high-quality and original visual data. Furthermore, there is a need for a system that can accurately understand and quickly reflect the user's conditions and intentions regarding visual data generation.

[0606] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0607] In this invention, the server includes means for analyzing input data and identifying the user's intent, means for inputting an image generation request as a prompt message to a generation AI model based on the analysis results, and means for checking the generated visual data and adjusting it as necessary. This makes it possible to quickly generate and provide original visual data that matches the user's intent.

[0608] An "information processing device" is a device that receives user input, processes it, and transmits it to another device or system.

[0609] A "generative AI model" is a mathematical model that uses artificial intelligence technology to generate visual data based on specified conditions.

[0610] A "prompt message" is text data that indicates instructions or conditions to be input into a generative AI model.

[0611] "Visual data" refers to digital data that includes images and graphical information, and is generated based on specific conditions or requirements.

[0612] This invention begins with a user inputting specific conditions or images using an information processing device. The user inputs desired keywords or image data into the information processing device via a keyboard or touchscreen. The information processing device acquires this input data and transmits it to a server via a network.

[0613] The server analyzes the received data, uses natural language processing libraries (e.g., NLTK or SpaCy) to analyze keywords and identify the user's intent. If image data is included, computer vision technologies (e.g., OpenCV or TensorFlow) are used to analyze the image data and extract relevant visual elements.

[0614] Based on the analyzed information, the server requests image generation from the generative AI model. This request is provided to the generative AI model as a prompt based on the analysis results. The generative AI model is typically built using GAN (Generative Adversarial Network) technology running on PyTorch or TensorFlow. For example, prompts such as "futuristic city," "modern," and "technology" can be used.

[0615] The generated visual data is temporarily stored on a server and its quality and compliance with requirements are verified by a dedicated review algorithm. The generated visual data may be readjusted as needed.

[0616] Finally, the server sends the verified visual data to the information processing device. Users can then view the generated results on the information processing device's screen and download or use them for other purposes. This system offers users the advantage of quickly and efficiently obtaining original visual data and utilizing it for various purposes.

[0617] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0618] Step 1:

[0619] The user inputs desired keywords or images using the keyboard or touchscreen on the device. The input data is collected by the device and includes information contained in the prompt sentences requested by the generating AI model. The device then processes this data to send it to the server.

[0620] Step 2:

[0621] The device sends input data collected from the user to the server. The data sent includes keywords in text format and, if necessary, image data. The device generates an HTTP request and sends the data to the server API via this request.

[0622] Step 3:

[0623] The server receives data sent from the terminal. The server uses a natural language processing library (e.g., NLTK or SpaCy) to analyze the input keywords and identify the user's intent. Specifically, it tokenizes the keywords and extracts key concepts. The server's output is the analyzed text data.

[0624] Step 4:

[0625] When the server receives image data, it analyzes it using computer vision technologies (e.g., OpenCV or TensorFlow). The server extracts image features and identifies relevant visual elements. The output is data related to the extracted visual elements.

[0626] Step 5:

[0627] The server generates and inputs prompt sentences to the generation AI model based on the analysis results. The server constructs prompt sentences that are appropriate to the user's intent and passes them to the generation AI model. The generation AI model generates images based on this input.

[0628] Step 6:

[0629] The generative AI model generates visual data based on the prompt text. The technology typically used is a GAN (Generative Adversarial Network), and the generated images are original. The output at this stage is a file containing the generated visual data.

[0630] Step 7:

[0631] The server reviews the generated visual data to ensure it meets the criteria. Using a quality check algorithm, it evaluates whether the generated images meet the requirements and makes adjustments if necessary. The output is the adjusted visual data.

[0632] Step 8:

[0633] The server sends the final visual data back to the terminal. The server sends the visual data to the terminal, which displays it in a format that the user can view. The user can then review the final generated visual data and download or use it.

[0634] (Application Example 1)

[0635] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0636] In advertising, there is a need to quickly create original and compelling visual content tailored to the target market, but traditional methods are time-consuming and costly. Furthermore, it is difficult to concretize diverse images and concepts in a short time, necessitating solutions to more efficiently enhance advertising effectiveness.

[0637] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0638] In this invention, the server includes a device that receives instructions from the user and grasps the conditions and concepts, a device that transmits the received instruction data to an information processing device, and a device that analyzes the instruction data in the information processing device and identifies the user's purpose. This makes it possible to efficiently generate visuals that can be easily used for advertising campaigns and dissemination activities.

[0639] A "device that receives instructions from a user" is a device that receives conditions and conceptual information input by a user and provides an interface for initiating processing.

[0640] An "information processing device" is a device that analyzes data and performs calculations to identify the user's purpose and intentions.

[0641] A "visual generation request device" is a device that, based on analyzed data, instructs the generation of visual content that conforms to specified conditions.

[0642] A "device for verifying generated visuals" is a device that evaluates the quality and compliance with requirements of the generated visual content and makes corrections as necessary.

[0643] A "device that provides the final visuals" is a device that provides the modified visual content to the user and makes it available for use.

[0644] "Visuals that can be easily used in advertising campaigns and promotional activities" refers to images and graphics designed to effectively reach a specific target market.

[0645] The system implementing this invention consists of a user terminal, an information processing server, and a generating AI model. The user uses their own device, such as a smartphone or computer, to input the conditions and concepts of the visual content necessary for advertising campaigns and promotional activities. The input information is transmitted from the terminal to the information processing server.

[0646] The server receives this information and analyzes the specified conditions and concepts using natural language processing and image analysis techniques. The analysis is primarily performed using Python and TensorFlow. As a result of the analysis, data is generated to materialize the visuals requested by the user. Subsequently, the server uses this data to leverage generative AI models, particularly generative adversarial networks (GANs), to generate original visuals that meet the specified conditions.

[0647] The generated visuals are reviewed on the server to determine their quality and suitability to customer requirements. Corrections are made as needed at this stage. The final visuals are sent back from the information processing server to the user's terminal, where the user can review them and use them in advertising campaigns, etc.

[0648] As a concrete example, consider a scenario where a user enters a prompt such as, "I want an event poster that gives a futuristic and innovative impression." Based on this prompt, the generative AI model creates a visual combining vibrant colors and a novel design. This visual can then be immediately used as advertising material.

[0649] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0650] Step 1:

[0651] The user inputs the necessary conditions and concepts for the advertising visual as prompts on their device. An example of user input is the prompt, "I want an event poster that gives a futuristic and innovative impression." The input data is temporarily stored on the device.

[0652] Step 2:

[0653] The terminal sends a prompt message to the information processing server. By sending the prompt message as input to the server, the prompt message is sent for analysis. The server receives it and begins data analysis.

[0654] Step 3:

[0655] The server uses natural language processing techniques to analyze the prompt text. The input is the user's prompt text, and the output is the visual conditions and objectives requested by the user. The server utilizes TensorFlow to extract specific images and concepts from keywords and phrases.

[0656] Step 4:

[0657] The server requests image generation from the AI ​​model based on the extracted conditions. The input is the analyzed data, and the output is the generation request. Using a GAN, the generation of a new visual that satisfies the specified conditions begins.

[0658] Step 5:

[0659] The generative AI model generates visuals according to the given conditions. The input is an image generation request, and the original visual is returned to the server as output. The generated visual is temporarily stored on the server.

[0660] Step 6:

[0661] The server reviews the generated visuals and checks their quality and compliance with requirements. Data processing and adjustments are made as needed. The input is the generated visual, and the output is the final visual.

[0662] Step 7:

[0663] Finally, the server sends the completed visual to the terminal. The terminal receives the visual as output and displays it to the user. The user can then review it and use it for advertising production, etc.

[0664] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0665] This invention enables highly personalized image generation by incorporating an emotion engine that recognizes emotions when a user inputs conditions and images into an image generation system via a terminal. In addition to inputting normal text and images, the user's emotions are monitored by the emotion engine. For example, by using sensors such as a camera and microphone to analyze the user's facial expressions and tone of voice, emotions are evaluated in real time.

[0666] The device analyzes the user's emotions detected using an emotion engine, integrates the results into the input data, and then sends it to the server. The server analyzes the received data and clarifies the conditions corresponding to the user's intentions and emotions. Next, it inputs these analysis results into an AI model and requests it to generate images with a style and atmosphere that matches the user's emotions.

[0667] The generated images are customized according to the user's emotional state, and the generation process selects elements and colors that are likely to emotionally satisfy the user. After initial validation on the server, the emotion engine is used again to confirm that the generated image is appropriate for the user's emotions, and final adjustments are made if necessary.

[0668] The terminal ultimately displays the image sent from the server to the user, who can then review the image and use it for document creation, presentations, and other purposes. For example, if a user is seeking a relaxing cityscape but is also feeling stressed, the system will suggest a peaceful and calming landscape, providing an image that resonates with the user's emotions.

[0669] By incorporating an emotion engine in this way, it is possible to generate images that match the user's emotions, dramatically improving the overall effectiveness of the system and the user experience.

[0670] The following describes the processing flow.

[0671] Step 1:

[0672] The user inputs conditions and images related to the desired image into the device. At this time, the device's built-in emotion engine analyzes the user's facial expressions and voice in real time using the camera and microphone to recognize the user's emotions.

[0673] Step 2:

[0674] The device sends user input data and analysis results from the emotion engine to the server. This ensures that the user's intentions and current emotional state are communicated together.

[0675] Step 3:

[0676] The server analyzes the received data. Text data is processed through natural language processing and combined with image data and emotional states to clarify the user's requests.

[0677] Step 4:

[0678] Based on the analysis results, the server requests image generation from the AI ​​model. Here, image elements corresponding to emotions recognized by the emotion engine are taken into consideration. For example, if the user is seeking relaxation, calm colors and styles will be selected.

[0679] Step 5:

[0680] The AI ​​model generates images based on requests. The model creates images that reflect a specific style or theme and returns the results to the server.

[0681] Step 6:

[0682] The server reviews the generated image and uses the emotion engine to re-verify whether the image is appropriate for the user's emotions. If necessary, it fine-tunes the color scheme and composition to make final adjustments that match the user's emotions.

[0683] Step 7:

[0684] The server sends the final image to the terminal. The sent image is provided to the user in the most suitable format.

[0685] Step 8:

[0686] The device displays the image to the user. The user can review the generated image and, if deemed appropriate, download and use it. Through this process, the user gains access to original images that resonate with their emotions.

[0687] (Example 2)

[0688] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0689] In image generation technology, conventional systems failed to consider user emotions, making it difficult to provide images that truly matched the user's needs and feelings. As a result, users were dissatisfied with the generated images and lacked a personalized experience.

[0690] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0691] In this invention, the server includes means for analyzing the user's emotional state using sensors, means for integrating the received input data and emotional data and transmitting it to the server, and means for making an image generation request based on the analysis results. This makes it possible to generate personalized images that take the user's emotions into consideration.

[0692] A "user" is an individual or group that uses an image generation system to input conditions and images in order to obtain personalized results.

[0693] "Input data" refers to data containing conditions and information that the user provides to the system, and may be expressed as prompts.

[0694] "Emotional state" refers to the state of mind indicated by the user's facial expressions, tone of voice, etc., and is detected by sensors.

[0695] A "server" is a computer system that analyzes received data and manages the image generation process.

[0696] An "image generation request" is an instruction to generate an image in a specified style and atmosphere based on the user's conditions and emotions.

[0697] An "artificial intelligence model" is a model that uses machine learning techniques to generate images tailored to the user's needs.

[0698] "Integration" is the process of combining user input data and sentiment data and processing them as a single dataset.

[0699] This invention is a technology for users to generate personalized images using an image generation system. The user inputs prompt text, including conditions and images, using a terminal. For example, the user can input the prompt text, "I want to see a relaxing landscape."

[0700] The device uses sensors such as cameras and microphones to analyze the user's facial expressions and voice tone in order to understand the user's emotional state. This analysis is performed by an emotion engine, and the user's emotions are evaluated in real time.

[0701] User input data and emotional data are integrated and sent to the server. The server analyzes this data to identify conditions based on the user's intentions and emotions. This enables the generation of optimal images that correspond to the user's emotional state.

[0702] Specifically, the server uses a generative AI model to request image generation based on the analysis results. The AI ​​model receives instructions such as generating a landscape painting with a calm and gentle color scheme. This mechanism allows users to obtain images that resonate with their emotions.

[0703] The generated images are evaluated again on the server using an emotion engine. After appropriate adjustments are made, they are provided to the user via the terminal. This allows users to utilize images that they find highly satisfactory in document creation and everyday use.

[0704] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0705] Step 1:

[0706] The user inputs prompt messages into the image generation system via a terminal. For example, they might input a condition such as, "I want to see a relaxing landscape." This input becomes the basic data for processing in the system. As output, the prompt messages are ready to be sent to the server.

[0707] Step 2:

[0708] The device analyzes the user's emotional state using sensors such as cameras and microphones. During this process, emotional data is acquired from the user's facial expressions and tone of voice. An emotion engine is then used to evaluate the user's emotional state in real time. The output of this step is numerical data representing the user's emotional state.

[0709] Step 3:

[0710] The terminal integrates prompt text and sentiment data. This integrated data is sent to the server. Data integration combines text input and sentiment values ​​to generate a dataset that comprehensively represents the user's state. The output is the integrated data ready for transmission.

[0711] Step 4:

[0712] The server analyzes the received integrated data. This analysis identifies specific conditions for image generation based on the user's intent and emotions. Based on the input information, this data analysis outputs the optimal image generation conditions.

[0713] Step 5:

[0714] The server inputs the specified conditions into the generating AI model and requests image generation. The AI ​​model is programmed, for example, to generate a landscape with relaxed color tones. The output of this step is a generated image that conforms to the conditions.

[0715] Step 6:

[0716] The server performs an initial validation of the generated image. It uses the emotion engine again to evaluate whether the generated image matches the user's emotions. If necessary, it adjusts the image's color tone and elements. The output of this step is the image adjusted to the user's emotional state.

[0717] Step 7:

[0718] The terminal provides the user with the final image sent from the server. The user can view this image and use it for document creation and presentations. The output of this step is the final image that the user can visually confirm.

[0719] (Application Example 2)

[0720] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0721] In modern online shopping, there is a lack of personalization that takes into account the user's emotional state, and there is a particular need to improve the provision of information and product suggestions that respond to consumers' emotions. Therefore, it is necessary to establish effective methods that can improve the user experience and increase purchasing intent.

[0722] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0723] In this invention, the server includes means for receiving input from the user and understanding conditions and images, means for analyzing the user's emotions using an emotional state detection engine, and means for making an image generation request adapted to the emotions based on the analysis results. This makes it possible to provide personalized product information and images that are in line with the user's emotions.

[0724] A "user" is a consumer who uses a system to input conditions and images and receives services.

[0725] "Input data" refers to information such as conditions and images provided by the user, which is then processed by the system.

[0726] An "emotion engine" is a system component that analyzes the user's facial expressions and voice to evaluate their emotional state in real time.

[0727] A "server" is a computer system that operates on a central computer, receives input data and sentiment data, and performs analysis.

[0728] "Analysis means" refers to a method by which a server processes input data and sentiment data to determine requests based on the user's intentions and emotions.

[0729] An "image generation request" is an instruction to generate an image with a specific style and atmosphere based on the user's input data and analyzed emotions.

[0730] "Adjustment means" refers to the process of checking whether the generated image matches the user's emotions and modifying the image content as needed.

[0731] "Means of delivery" refers to the method by which the final generated customized image is presented to the user.

[0732] To realize this invention, the system analyzes user input data using an emotion engine and acquires the emotional state as data. Users use smartphones or other devices, and by using a camera and microphone as input devices, their emotions are detected in real time from their facial expressions and voice. The emotion engine identifies this emotional state by utilizing image analysis libraries and voice analysis libraries.

[0733] The terminal sends user-inputted conditions, images, and analyzed emotion data to the server. The server receives this data and uses software to perform analysis. On the server, based on the emotion data, a generative AI model is used to generate images with styles and atmospheres that correspond to the user's emotional state. The generated images undergo initial validation on the server and are adjusted as needed.

[0734] As a concrete example, if a user requests "relaxing cityscapes" while online shopping, and the system detects their stress, the server will use a generative AI model to suggest images with calm and soothing colors. An example of a prompt sent to the generative model would be, "Generate a promotional image for a product to display when the user is feeling relaxed. The image should have soft colors and a calming atmosphere." This allows the user to have a positive experience through the image. In this way, the invention can provide a service that adapts to the user's emotions, resulting in a richer user experience.

[0735] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0736] Step 1:

[0737] The user uses a device and inputs conditions and emotional states using the camera and microphone as input devices. The input data here consists of the user's facial image and voice data, which are sent to the emotion engine. The emotion engine analyzes this data and converts the user's emotional state into numerical emotional data.

[0738] Step 2:

[0739] The terminal sends the input conditions and images, along with the emotional data analyzed by the emotion engine, to the server. During this process, the data is securely encrypted using a communication protocol and reaches the server via the network.

[0740] Step 3:

[0741] The server analyzes the received input data and sentiment data to identify the user's intent. This process also utilizes contextual information from the database to determine which images are appropriate. The analysis results are then used to construct instructions for input into the generating AI model.

[0742] Step 4:

[0743] The server sends a prompt message to the AI ​​model based on the analysis results, requesting image generation. The prompt message reflects the user's emotional state and intentions, and the AI ​​model generates an image in the specified style and atmosphere.

[0744] Step 5:

[0745] The generated images undergo initial validation on the server to confirm that they match the user's emotional state. In this step, the image's color tone and other aspects are adjusted as needed to prepare it as the final image.

[0746] Step 6:

[0747] The server sends the final, adjusted image to the device. The device displays this image to the user, who then uses it to make purchases or other decisions. As a result, the user is provided with a personalized experience.

[0748] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0749] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0750] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0751] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0752] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0753] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0754] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0755] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0756] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0757] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0758] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0759] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0760] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0761] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0762] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0763] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0764] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0765] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0766] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0767] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0768] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0769] The following is further disclosed regarding the embodiments described above.

[0770] (Claim 1)

[0771] A means of receiving input from the user and understanding the conditions and image,

[0772] A means of sending the received input data to the server,

[0773] A means of analyzing input data on the server and identifying the user's intent,

[0774] A means for making an image generation request based on the analysis results,

[0775] Means for reviewing and adjusting the generated image as needed,

[0776] Means of providing the final image to the user,

[0777] A system that includes this.

[0778] (Claim 2)

[0779] The system according to claim 1, which generates an image using an AI model in response to a generation request.

[0780] (Claim 3)

[0781] The system according to claim 1, comprising a process of verifying whether the generated image has been adjusted to suit the user.

[0782] "Example 1"

[0783] (Claim 1)

[0784] In an information processing device, there is a means for receiving input from the user and understanding the conditions and image,

[0785] A means for transmitting the received input data to an information processing device,

[0786] A means for analyzing input data in an information processing device to identify the user's intent,

[0787] A means for inputting an image generation request as a prompt message to the generation AI model based on the analysis results,

[0788] A means to review the generated visual data and adjust it as needed,

[0789] Means of providing the final visual data to the user,

[0790] A system that includes this.

[0791] (Claim 2)

[0792] The system according to claim 1, which generates visual data using a generation AI model in response to a generation request.

[0793] (Claim 3)

[0794] The system according to claim 1, comprising a process of adjusting the generated visual data and verifying whether it is suitable for the user.

[0795] "Application Example 1"

[0796] (Claim 1)

[0797] A device that receives instructions from the user and understands the conditions and concepts,

[0798] A device that transmits received instruction data to an information processing device,

[0799] A device that analyzes instruction data in an information processing device and identifies the user's objective,

[0800] A device that requests the generation of a visual based on the analysis results,

[0801] A device for reviewing and modifying the generated visuals as needed,

[0802] A device that provides the final visual to the user,

[0803] A device that efficiently generates visuals that can be easily used for advertising campaigns and dissemination activities,

[0804] A system that includes this.

[0805] (Claim 2)

[0806] The system according to claim 1, which generates a visual using an artificial intelligence model in response to a generation request.

[0807] (Claim 3)

[0808] The system according to claim 1, further comprising the step of checking whether the generated visuals are adjusted to suit the user.

[0809] "Example 2 of combining an emotion engine"

[0810] (Claim 1)

[0811] A means of receiving input from the user and understanding conditions and information,

[0812] A means of analyzing a user's emotional state using sensors,

[0813] A means of integrating the received input data and emotional data and sending it to the server,

[0814] A means of analyzing input data on the server to identify the user's intentions and emotions,

[0815] A means for making an image generation request based on the analysis results,

[0816] Means for reviewing and adjusting the generated image as needed,

[0817] Means of providing the final image to the user,

[0818] A system that includes this.

[0819] (Claim 2)

[0820] The system according to claim 1, which generates an image using an artificial intelligence model in response to a generation request.

[0821] (Claim 3)

[0822] The system according to claim 1, comprising a process of verifying whether the generated image is adjusted based on emotion and suitable for the user.

[0823] "Application example 2 when combining with an emotional engine"

[0824] (Claim 1)

[0825] A means of receiving input from the user and understanding the conditions and image,

[0826] A means of analyzing a user's emotions using an engine that detects emotional states,

[0827] A means for transmitting the received input data and analyzed emotion data to a server,

[0828] A means for analyzing input data and sentiment data on the server to identify the user's intent,

[0829] A means for making an image generation request adapted to emotions based on the analysis results,

[0830] Means for reviewing and adjusting the generated image as needed,

[0831] Ultimately, the means of providing users with images that resonate with their emotions,

[0832] A system that includes this.

[0833] (Claim 2)

[0834] The system according to claim 1, which generates emotion-based images using an engine in response to a generation request.

[0835] (Claim 3)

[0836] The system according to claim 1, comprising a process of verifying that the generated image is adjusted to suit the user's emotions. [Explanation of Symbols]

[0837] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of receiving input from the user and understanding the conditions and image, A means of sending the received input data to the server, A means of analyzing input data on the server and identifying the user's intent, A means for making an image generation request based on the analysis results, Means for reviewing and adjusting the generated image as needed, Means of providing the final image to the user, A system that includes this.

2. The system according to claim 1, which generates an image using an AI model in response to a generation request.

3. The system according to claim 1, comprising a process of verifying whether the generated image has been adjusted to suit the user.