system

The system efficiently generates and delivers high-quality, copyright-free images using a generative AI model, addressing the inefficiencies of conventional methods by optimizing image delivery for user devices.

JP2026105488APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Conventional methods for obtaining copyright-free images require significant time and effort, and the resulting images often fail to meet user-specific conditions.

Method used

A system that utilizes a generative artificial intelligence model to efficiently generate high-quality, copyright-free images based on user input conditions, optimizing and formatting them for smooth delivery.

Benefits of technology

Enables rapid generation and provision of images that meet user requirements, ensuring high quality and compatibility with user devices.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105488000001_ABST
    Figure 2026105488000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means for receiving conditions from users, A means of receiving conditions from users, A generation means that generates visual data using a generation artificial intelligence algorithm based on the above conditions, The aforementioned optimized visual data is optimized and converted into a variable form, A means for providing the optimized visual data to the user, A constituent element that includes this element.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern times when information is increasing, it is important to quickly and easily obtain copyright-free images suitable for materials and presentations. However, conventional methods have problems of requiring a lot of time and labor to find the necessary images. Also, it is often the case that the obtained images do not necessarily meet the desired conditions. To solve these problems, an efficient system for automatically generating high-quality copyright-free images based on user requirements is needed.

Means for Solving the Problems

[0005] This invention provides a system that receives requests from users, generates images using an artificial intelligence model based on those conditions, and provides the results to the users. Specifically, it includes a receiving means for receiving user input conditions, a generating means for analyzing the conditions and preparing the necessary context for the generation process, and a means for compressing and formatting the generated images before providing them, thereby enabling the efficient generation and provision of copyright-free and original images.

[0006] A "reception mechanism" is a function that provides an interface for the system to receive input and conditions from users.

[0007] "Generation means" refers to a function that uses a generative artificial intelligence model to create a corresponding image based on the received conditions.

[0008] "Delivery method" refers to a function that compresses or converts the format of generated images in order to deliver them to users in a format that they can view.

[0009] A "generative artificial intelligence model" is a technology that synthesizes new content based on past training data, according to the user's specific requests.

[0010] An "image" is a representation of visual information as digital data, and includes still images and diagrams. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0017] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0018] ​In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] This invention is designed as a system that users can use intuitively. Users access the system using their own devices and request image generation based on specific conditions. The device reads the requests entered by the user and sends them to the server as digital data. This data includes keywords, image style, color, and other detailed conditions.

[0033] The server analyzes the received request and defines the criteria for the image to be generated. The generation mechanism operating within the server uses a generative artificial intelligence model to generate images based on these criteria. In this process, machine learning models and neural networks are commonly used. The AI ​​model recognizes patterns from a vast amount of training data and uses them to create new images.

[0034] The generated images are optimized by the server. This involves image compression and format conversion, adjusting the file size appropriately. In this way, the images are prepared for smooth transfer and display. Finally, the optimized images are sent back to the terminal by the delivery method and presented to the user.

[0035] The device displays the received images on its interface, allowing users to easily download or use them directly. For example, if a user requests a "landscape rich in nature," the AI ​​model generates a natural landscape image reflecting that request and provides it to the user. This process allows the user to instantly obtain original images suited to their purpose and use them in presentations or projects.

[0036] The following describes the processing flow.

[0037] Step 1:

[0038] The user uses their device to input the conditions for image generation. This includes keywords, desired style, color scheme, etc. The user interface is designed to retrieve this information.

[0039] Step 2:

[0040] The terminal converts the user's input data into an appropriate data format (e.g., JSON) and sends a request to the server. This often uses the HTTP protocol.

[0041] Step 3:

[0042] The server analyzes the received request and extracts the conditions and context necessary for image generation. This may include text analysis using natural language processing.

[0043] Step 4:

[0044] Based on the analysis results, the server adjusts the parameters for the generated images and provides them to the AI ​​model as input data.

[0045] Step 5:

[0046] The generation mechanism on the server uses a generative artificial intelligence model to generate original images based on these parameters. The AI ​​model creates images by utilizing pre-trained data.

[0047] Step 6:

[0048] The generated images are compressed and format-converted by the server to optimize file size. This process reduces the load during transfer and improves display speed on the device.

[0049] Step 7:

[0050] The optimized image data is sent from the server to the terminal. This is also done via an HTTP response.

[0051] Step 8:

[0052] The device displays the received image in the user interface. The user can then download this image or use it in their materials.

[0053] (Example 1)

[0054] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0055] The present invention addresses the need for an image generation system that is intuitive for users to use, capable of rapidly generating and efficiently providing high-quality images based on specified conditions. In particular, it requires optimal image generation to meet diverse user needs, as well as appropriate optimization for smooth image transfer and display.

[0056] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0057] In this invention, the server includes means for receiving conditions from an input means and transmitting them as digital data, means for analyzing the conditions and generating an image using a generative artificial intelligence model, and means for optimizing the generated image to adjust its size and format and providing it to the user terminal. This makes it possible to respond quickly to various user requests and provide images in an optimal state.

[0058] The "input mechanism" refers to the part that has the function of receiving conditions and requests from the user.

[0059] "Conditions" refer to information that allows the user to specify the characteristics and style of the image they wish to generate.

[0060] "Digital data" refers to information that has been converted into a format that can be processed by a computer, such as conditions or requests.

[0061] The "analysis means" refers to the part that analyzes the conditions received from the user and has the function of preparing the criteria and context necessary for image generation.

[0062] A "generative artificial intelligence model" is a model that uses machine learning algorithms to learn from large amounts of data and generate new images.

[0063] "Image generation" is the process of creating a new image based on conditions received from the user.

[0064] "Optimization processing" is the process of compressing and converting the format of generated images to adjust them into a state that allows for efficient storage and transfer.

[0065] "Delivery means" refers to the part that has the function of sending optimized images to the user's device and making them available for the user to use.

[0066] This invention is a system for efficiently generating and providing high-quality images based on user input of image generation conditions. Users access the system using a personal computer or smartphone and input specific conditions for the image they wish to generate. This involves specifying information such as keywords, style, and color through prompt messages.

[0067] The terminal converts the input conditions into digital data and sends it to the server. The server receives this data, analyzes the user's request in detail using analytical tools, and generates the desired image using a generative AI model. The generative AI model generates new images in real time using patterns learned from a large amount of training data. The AI ​​model used in this process is often built on machine learning frameworks such as the nn library or Tensorflow®.

[0068] The generated images undergo optimization processing on the server, including compression and format conversion, to ensure efficient transfer and display. Finally, the server sends this optimized image back to the terminal, which then displays it on the user interface. This allows users to immediately download the generated images for use in the Download Project.

[0069] For example, if a user enters the prompt "a landscape rich in nature," the server's AI model generates a natural landscape image that reflects this prompt and provides it to the user. This allows the user to quickly obtain original images that are suitable for presentation materials or web content.

[0070] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0071] Step 1:

[0072] The user accesses the system using a terminal and enters the conditions for image generation. The user specifies keywords, style, and color of the image to be generated as prompts. The entered information is processed as digital data within the terminal and sent to the server. This step accurately converts the user's specific requests into data.

[0073] Step 2:

[0074] The server receives digital data transmitted from the terminal. The received data is analyzed using analytical tools to establish the necessary criteria and context for image generation. Through this analysis, specific image generation conditions are set based on the input prompt text. In this step, data processing is performed to understand the user's requirements and create an appropriate configuration.

[0075] Step 3:

[0076] The server-side generation mechanism uses a generative AI model to generate images based on the analyzed conditions. The AI ​​model utilizes a large amount of training data to create new images that match the input conditions. The generative AI model is implemented using a machine learning framework and leverages pattern recognition capabilities to generate images. This step involves the specific execution of complex image generation algorithms.

[0077] Step 4:

[0078] The generated images are optimized on the server. This process involves image compression and format conversion, adjusting them for efficient transfer and display. The optimized images are converted to the optimal file size and format for transfer. In this step, data calculations are performed to make the generated images usable.

[0079] Step 5:

[0080] The server sends the optimized image to the terminal. The terminal displays the received image on its user interface, making it easy for the user to download and use. This step involves providing information in a user-friendly format.

[0081] (Application Example 1)

[0082] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0083] In generating visual information, it is essential that users can intuitively set conditions, and that the generated images are high-quality and efficiently optimized. Furthermore, the generated visual information must be delivered smoothly to the user, enhancing the personalized content experience.

[0084] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0085] In this invention, the server includes a receiving means for receiving conditions from a user, a generating means for generating visual data using a generation artificial intelligence algorithm based on the conditions, and an optimization means for optimizing the generated visual data and converting it into a variable form. This makes it possible to quickly and efficiently generate and provide the visual information requested by the user.

[0086] A "means of receiving conditions from users" refers to a component within a system that receives specific requests or wishes from users.

[0087] "Generative means for generating visual data using a generative artificial intelligence algorithm" refers to an element within a system that generates image data using artificial intelligence technology based on user conditions.

[0088] "Optimization means for optimizing generated visual data and converting it into a variable form" refers to a function within a system that improves the quality and format of the generated image, making it usable by the user effectively.

[0089] "Means of delivery" refers to components within a system that have the function of delivering the generated visual data to the user.

[0090] A "constituent" is a collection of multiple functional means that cooperate to form the entire system.

[0091] One embodiment of the present invention provides a system that allows users to intuitively generate and utilize visual information. This system functions by allowing users to input the desired image conditions using a smartphone or other device. The input conditions include keywords, image style, color, and other detailed instructions.

[0092] The server processes this input and uses generative artificial intelligence algorithms (e.g., models built on PyTorch or TensorFlow) to generate visual data based on specified conditions. This image generation process is achieved by recognizing patterns based on a vast amount of training data and generating new visual content.

[0093] The generated visual data is processed by optimization techniques, including compression and format conversion. This allows image files to be displayed and downloaded smoothly on the user's device. Optimization techniques are applied to reduce file size while maintaining image quality.

[0094] The optimized images are delivered to the user through the distribution method, and the user can view them on a smartphone application and download or use them further as needed.

[0095] For example, if a user specifies "spring cherry blossom scenery" within the application, the server will generate an image that conforms to that theme. The generation AI model may use a prompt such as, "Create a vivid image of spring cherry blossoms under a clear blue sky." This allows users to easily obtain high-quality seasonal theme images and utilize them in content creation.

[0096] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0097] Step 1:

[0098] The user launches the application on their smartphone and specifies the details of the image they want to generate on the conditions input screen. This includes elements such as keywords, style, and color. The entered information is temporarily stored as data on the device.

[0099] Step 2:

[0100] The terminal sends temporarily stored condition data to the server. The server receives this data and begins analysis. During analysis, data processing is performed to form an appropriate prompt sentence based on the specified conditions. For example, if "spring cherry blossom scenery" is specified, the prompt sentence "Create a vivid image of spring cherry blossoms under a clear blue sky." will be generated.

[0101] Step 3:

[0102] The server invokes a generative artificial intelligence model based on the generated prompt text. The AI ​​model (e.g., Stable Diffusion) receives the prompt text and generates relevant visual information. In this process, the model utilizes existing data patterns to output new image data.

[0103] Step 4:

[0104] The generated image data is passed to an optimization mechanism on the server. This mechanism compresses the image, converts it to an appropriate format, adjusts the image size, and processes it to optimize display on the user's device. This process improves transfer efficiency while maintaining quality.

[0105] Step 5:

[0106] The optimized visual data is resent from the server to the device. The device receives this data and displays the image within the application. The user can review the displayed image and, if necessary, download it or integrate it with other content.

[0107] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0108] This invention is designed as an image generation system that takes user emotions into account. The system begins with the user inputting image generation conditions via a terminal. When the user specifies keywords and content, the terminal has a built-in emotion engine that recognizes the user's emotions through voice input and text analysis. This emotion analysis adds emotional nuances, such as "cheerful" or "calm," to the conditions.

[0109] The terminal structures this input data and sends it to the server. The server analyzes the received data and prepares the optimal parameters for image generation based on the conditions and emotions. Next, the generation mechanism within the server generates an original image using a pre-trained generative artificial intelligence model. In this process, parameters based on the user's emotions are also taken into consideration and efforts are made to reflect them in the image's color tone and composition.

[0110] The generated images are further emotionally adjusted by the server. For example, if a cheerful mood is detected, brighter colors and a more dynamic composition are selected. The images are also compressed, converted to different formats, and resized appropriately. The server then sends the optimized data back to the terminal.

[0111] The device displays received images on the user interface, allowing users to view them directly and use or download them as needed. For example, if a user requests a "relaxing landscape" and the analysis determines their emotions are "calm and peaceful," the AI ​​model generates and provides images reminiscent of a calm sea or a soft sunset. This format allows users to easily obtain images appropriate to their psychological state.

[0112] The following describes the processing flow.

[0113] Step 1:

[0114] The user uses the input interface on the device to specify the conditions for the image they want to generate. This can include keywords or desired themes. Furthermore, when using voice or text input, the device's built-in emotion engine analyzes the user's voice tone and the words they use to detect their current emotional state.

[0115] Step 2:

[0116] The terminal forms a dataset containing the conditions entered by the user and sentiment information recognized by the sentiment engine, and sends this dataset to the server. This data includes information related to sentiment in addition to the conditions specified by the user.

[0117] Step 3:

[0118] The server analyzes the received data and adjusts the detailed parameters for image generation based on the conditions and emotional information specified by the user. The server uses the emotional information to determine how to reflect it in the image's color tone and composition.

[0119] Step 4:

[0120] The server uses a generation mechanism and a highly trained generative artificial intelligence model to create new images. In this process, image generation is performed based on prepared parameters to reflect the user's emotions in terms of color tone and composition.

[0121] Step 5:

[0122] The server then performs further optimizations on the generated images. In particular, after making fine adjustments based on emotional information, it compresses the images and converts the format to reduce data volume and improve transfer efficiency.

[0123] Step 6:

[0124] The optimized image is sent from the server to the terminal. The terminal receives this data and displays the image to the user through the user interface.

[0125] Step 7:

[0126] Users can view the displayed images and add them to their materials or download them to their devices as needed. This process allows users to intuitively and quickly obtain images that match their emotional state.

[0127] (Example 2)

[0128] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0129] Conventional image generation systems have been unable to adequately consider the user's emotions, resulting in the problem that the generated images do not match the user's expectations or psychological state. Therefore, there were limitations in the quality and satisfaction level of the generated images.

[0130] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0131] In this invention, the server includes means for receiving image generation conditions and emotional information from the user via a terminal, means for analyzing the input conditions and emotional information and setting optimal parameters for generation, means for generating an image using a generation artificial intelligence model based on the parameters, and means for optimizing the generated image by applying emotional adjustments. This makes it possible to generate high-quality images that are in line with the user's emotional state.

[0132] "Conditions" refer to information that specifically indicates the attributes and characteristics that the user desires when generating an image.

[0133] "Emotional information" refers to emotional data extracted from input voice and text that reflects the user's psychological state.

[0134] A "generative artificial intelligence model" is a type of artificial intelligence structure that uses pre-trained algorithms to generate images.

[0135] "Optimal parameters" are settings adjusted to reflect conditions and emotional information in image generation, thereby improving the accuracy and quality of the generation process.

[0136] "Emotional adjustment" is a process that modifies the color tone and composition of a generated image, taking emotional information into consideration.

[0137] "Optimization" refers to processing generated images so that they meet user expectations and can be displayed and downloaded quickly and smoothly.

[0138] Embodiments for this invention are shown below.

[0139] First, the user uses the device to input the desired image criteria. This includes keywords and specific images, such as "calm landscape" or "dynamic urban scene." The device is equipped with an emotion recognition engine that analyzes the user's input through voice and text to extract the user's emotional information. From this information, emotional nuances such as "relaxed" or "lively" can be obtained.

[0140] Next, the device sends the generated dataset to the server. The server analyzes this data and sets the optimal parameters. This setting uses a pre-trained generative artificial intelligence model to generate images based on conditions and sentiment information. The generated images are then emotionally adjusted in terms of color and composition to reflect the user's emotional state.

[0141] The generated images are compressed and format-converted by the server, optimized for the user's device, and then sent. The device displays the received optimized image in its user interface, allowing the user to review the provided image. This image can be downloaded if needed.

[0142] For example, if a user inputs "a relaxing landscape," the device analyzes it based on emotional information such as "calm and peaceful." This data is sent to a server, where a generative AI model is used to generate images of a calm sea or a soft sunset. These images are then provided to the user.

[0143] An example of a prompt might be, "Generate a relaxing landscape. Sentiment analysis identifies it as 'calm and peaceful'." Based on this prompt, the system can provide an image suitable for the user.

[0144] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0145] Step 1:

[0146] The user inputs image generation conditions into the device. This input includes keywords and desired images, such as "relaxing scenery." The device receives these conditions and uses an emotion recognition engine to extract the user's emotional information from the input data. The input is text or audio data, and the output is information about the analyzed emotional state.

[0147] Step 2:

[0148] The terminal converts the input conditions and sentiment information into a structured dataset as a preprocessing step and sends it to the server. The dataset contains detailed conditions and sentiment information, which facilitates subsequent processing. The input for this step is the user's conditions and extracted sentiment information, and the output is a structured dataset for transmission to the server.

[0149] Step 3:

[0150] The server analyzes the received dataset and sets the optimal parameters for image generation. Specifically, it determines parameters such as color, style, and composition to be used based on the information. The input for this step is the dataset received from the terminal, and the output is the optimized parameters for image generation.

[0151] Step 4:

[0152] The server-based generation mechanism generates images using a generative artificial intelligence model based on optimized parameters. This model is pre-trained and capable of generating a variety of images. The input is the optimized parameters, and the output is the generated original image.

[0153] Step 5:

[0154] The server applies emotional adjustments to the generated image. These adjustments modify the image's color tone, brightness, and composition based on the user's emotional information. For example, if a calm emotion is detected, a gentle color tone will be selected. The input is the generated image and emotional information, and the output is the adjusted image.

[0155] Step 6:

[0156] The server compresses and converts the format of the adjusted image and sends the optimized data to the terminal. This process prepares the data for smooth image display and download. The input is the adjusted image, and the output is the compressed and format-converted image data.

[0157] Step 7:

[0158] The terminal displays the received image data to the user. The user can view the displayed image and, if necessary, download it or use it in other materials. The input is the image data received from the server, and the output is the image displayed on the user interface.

[0159] (Application Example 2)

[0160] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0161] Conventional advertising image generation systems were limited to generating images based solely on conditions, making it difficult to create flexible images that took into account the emotional state of the user. In particular, providing visual content that resonates with the user's emotions is crucial in the advertising field, but this has not been adequately achieved. This has resulted in the challenge of not maximizing the effectiveness of advertising.

[0162] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0163] In this invention, the server includes a receiving means for receiving conditions from the user, a generating means for analyzing the user's emotional state and generating an image using a computer model, and a providing means for presenting the generated image on the user's digital device. This makes it possible to efficiently generate and provide appropriate advertising images according to the user's emotional state.

[0164] A "user" is someone who provides the conditions for image generation through this system and receives the generated image.

[0165] "Conditions" refer to the specific requirements or desired elements that the user specifies regarding the images they generate.

[0166] "Emotional state" refers to the psychological state of a user, and is identified by analyzing information obtained from voice and text.

[0167] A "computer model" is a system that includes algorithms executed on a digital device and is used to incorporate conditions and emotional states in image generation.

[0168] "Digital devices" refer to electronic devices used to display generated images, such as smartphones, tablets, and personal computers.

[0169] "Generation method" refers to a series of processes for generating images by utilizing a computer model based on conditions and emotional states.

[0170] "Means of provision" refers to means including interfaces and digital devices for presenting the generated images to the user.

[0171] The system that realizes this invention aims to effectively provide visual advertising content by generating images that correspond to the user's emotional state and presenting them on a digital device. A specific example of this system is shown below.

[0172] The system's program runs on the user's smartphone or tablet device and provides an interface for the user to input criteria for advertising images. These criteria include information such as the type of product or atmosphere the user expects to see in the images. These criteria are then transmitted from the digital device to the server.

[0173] The server receives conditions sent by the user and uses Google® Cloud Speech-to-Text or Google Cloud Natural Language API to obtain the user's emotional state from speech or text. Based on this analysis, the Stable Diffusion generative AI model is used to establish image generation parameters that are fused with the conditions.

[0174] If a user wants to generate an image that evokes a sense of peace, the emotion analysis will identify a state of "relaxation." Based on this, the server generates an advertising image with calming colors and composition.

[0175] The server then sends the generated image to the digital device, allowing the user to view it on their device. This image has been compressed, formatted, and resized as appropriate.

[0176] For example, if a user sets a "scene that feels comfortable" in their daily life, the generated ad visuals are likely to depict a relaxed beach or soft sunlight. Prompts such as "Please create an ad visual that evokes a sense of peace" are supported.

[0177] This format allows advertisers and content creators to create and distribute more engaging and effective advertisements and visual content that are tailored to the target audience's psychological state.

[0178] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0179] Step 1:

[0180] The user uses their device to input the desired conditions for the advertisement image into the interface. Specifically, they provide the product name and desired mood (e.g., calm, cheerful) in text or voice. This information is collected as input data on the device.

[0181] Step 2:

[0182] The terminal sends the conditions collected in Step 1 to the server. If there is voice input, the terminal performs preprocessing, such as converting the voice data to text, and formats it into a format that the server can easily handle before sending it. This data becomes the server input.

[0183] Step 3:

[0184] The server performs sentiment analysis on the received text data using the Google Cloud Speech-to-Text and Google Cloud Natural Language APIs. Through this analysis, the server extracts the user's emotional state in terms of "relaxed" or "excited," and outputs this as sentiment data.

[0185] Step 4:

[0186] The server combines emotion data with user-defined conditions to set optimal parameters for image generation. These settings include prompts used by the AI ​​image generation model (e.g., Stable Diffusion), such as color tone and composition.

[0187] Step 5:

[0188] The server runs a generative AI model using the configured parameters to generate an image. The generated image is a computer-generated image whose brightness and hue are adjusted according to the user's emotional state.

[0189] Step 6:

[0190] The generated images are compressed and formatted on the server and optimized for transmission to the terminal. This is to adjust the size and format so that they can be displayed more quickly on digital devices.

[0191] Step 7:

[0192] The terminal receives optimized image data from the server and displays the images through the user interface. The user can view these generated images and, if necessary, download or use them for further processing.

[0193] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0194] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0195] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0196] [Second Embodiment]

[0197] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0198] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0199] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0200] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0201] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0202] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0203] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0204] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0205] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0206] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0207] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0208] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0209] This invention is designed as a system that users can use intuitively. Users access the system using their own devices and request image generation based on specific conditions. The device reads the requests entered by the user and sends them to the server as digital data. This data includes keywords, image style, color, and other detailed conditions.

[0210] The server analyzes the received request and defines the criteria for the image to be generated. The generation mechanism operating within the server uses a generative artificial intelligence model to generate images based on these criteria. In this process, machine learning models and neural networks are commonly used. The AI ​​model recognizes patterns from a vast amount of training data and uses them to create new images.

[0211] The generated images are optimized by the server. This involves image compression and format conversion, adjusting the file size appropriately. In this way, the images are prepared for smooth transfer and display. Finally, the optimized images are sent back to the terminal by the delivery method and presented to the user.

[0212] The device displays the received images on its interface, allowing users to easily download or use them directly. For example, if a user requests a "landscape rich in nature," the AI ​​model generates a natural landscape image reflecting that request and provides it to the user. This process allows the user to instantly obtain original images suited to their purpose and use them in presentations or projects.

[0213] The following describes the processing flow.

[0214] Step 1:

[0215] The user uses their device to input the conditions for image generation. This includes keywords, desired style, color scheme, etc. The user interface is designed to retrieve this information.

[0216] Step 2:

[0217] The terminal converts the user's input data into an appropriate data format (e.g., JSON) and sends a request to the server. This often uses the HTTP protocol.

[0218] Step 3:

[0219] The server analyzes the received request and extracts the conditions and context necessary for image generation. This may include text analysis using natural language processing.

[0220] Step 4:

[0221] Based on the analysis results, the server adjusts the parameters for the generated images and provides them to the AI ​​model as input data.

[0222] Step 5:

[0223] The generation mechanism on the server uses a generative artificial intelligence model to generate original images based on these parameters. The AI ​​model creates images by utilizing pre-trained data.

[0224] Step 6:

[0225] The generated images are compressed and format-converted by the server to optimize file size. This process reduces the load during transfer and improves display speed on the device.

[0226] Step 7:

[0227] The optimized image data is sent from the server to the terminal. This is also done via an HTTP response.

[0228] Step 8:

[0229] The device displays the received image in the user interface. The user can then download this image or use it in their materials.

[0230] (Example 1)

[0231] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0232] The present invention addresses the need for an image generation system that is intuitive for users to use, capable of rapidly generating and efficiently providing high-quality images based on specified conditions. In particular, it requires optimal image generation to meet diverse user needs, as well as appropriate optimization for smooth image transfer and display.

[0233] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0234] In this invention, the server includes means for receiving conditions from an input means and transmitting them as digital data, means for analyzing the conditions and generating an image using a generative artificial intelligence model, and means for optimizing the generated image to adjust its size and format and providing it to the user terminal. This makes it possible to respond quickly to various user requests and provide images in an optimal state.

[0235] The "input mechanism" refers to the part that has the function of receiving conditions and requests from the user.

[0236] "Conditions" refer to information that allows the user to specify the characteristics and style of the image they wish to generate.

[0237] "Digital data" refers to information that has been converted into a format that can be processed by a computer, such as conditions or requests.

[0238] The "analysis means" refers to the part that analyzes the conditions received from the user and has the function of preparing the criteria and context necessary for image generation.

[0239] A "generative artificial intelligence model" is a model that uses machine learning algorithms to learn from large amounts of data and generate new images.

[0240] "Image generation" is the process of creating a new image based on conditions received from the user.

[0241] "Optimization processing" is the process of compressing and converting the format of generated images to adjust them into a state that allows for efficient storage and transfer.

[0242] "Delivery means" refers to the part that has the function of sending optimized images to the user's device and making them available for the user to use.

[0243] This invention is a system for efficiently generating and providing high-quality images based on user input of image generation conditions. Users access the system using a personal computer or smartphone and input specific conditions for the image they wish to generate. This involves specifying information such as keywords, style, and color through prompt messages.

[0244] The terminal converts the input conditions into digital data and sends it to the server. The server receives this data, analyzes the user's request in detail using analytical tools, and generates the desired image using a generative AI model. The generative AI model generates new images in real time using patterns learned from a large amount of training data. The AI ​​model used in this process is often built on machine learning frameworks such as nn libraries or TensorFlow.

[0245] The generated images undergo optimization processing on the server, including compression and format conversion, to ensure efficient transfer and display. Finally, the server sends this optimized image back to the terminal, which then displays it on the user interface. This allows users to immediately download the generated images for use in the Download Project.

[0246] For example, if a user enters the prompt "a landscape rich in nature," the server's AI model generates a natural landscape image that reflects this prompt and provides it to the user. This allows the user to quickly obtain original images that are suitable for presentation materials or web content.

[0247] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0248] Step 1:

[0249] The user accesses the system using a terminal and enters the conditions for image generation. The user specifies keywords, style, and color of the image to be generated as prompts. The entered information is processed as digital data within the terminal and sent to the server. This step accurately converts the user's specific requests into data.

[0250] Step 2:

[0251] The server receives digital data transmitted from the terminal. The received data is analyzed using analytical tools to establish the necessary criteria and context for image generation. Through this analysis, specific image generation conditions are set based on the input prompt text. In this step, data processing is performed to understand the user's requirements and create an appropriate configuration.

[0252] Step 3:

[0253] The server-side generation mechanism uses a generative AI model to generate images based on the analyzed conditions. The AI ​​model utilizes a large amount of training data to create new images that match the input conditions. The generative AI model is implemented using a machine learning framework and leverages pattern recognition capabilities to generate images. This step involves the specific execution of complex image generation algorithms.

[0254] Step 4:

[0255] The generated images are optimized on the server. This process involves image compression and format conversion, adjusting them for efficient transfer and display. The optimized images are converted to the optimal file size and format for transfer. In this step, data calculations are performed to make the generated images usable.

[0256] Step 5:

[0257] The server sends the optimized image to the terminal. The terminal displays the received image on its user interface, making it easy for the user to download and use. This step involves providing information in a user-friendly format.

[0258] (Application Example 1)

[0259] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0260] In generating visual information, it is essential that users can intuitively set conditions, and that the generated images are high-quality and efficiently optimized. Furthermore, the generated visual information must be delivered smoothly to the user, enhancing the personalized content experience.

[0261] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0262] In this invention, the server includes a receiving means for receiving conditions from a user, a generating means for generating visual data using a generation artificial intelligence algorithm based on the conditions, and an optimization means for optimizing the generated visual data and converting it into a variable form. This makes it possible to quickly and efficiently generate and provide the visual information requested by the user.

[0263] A "means of receiving conditions from users" refers to a component within a system that receives specific requests or wishes from users.

[0264] "Generative means for generating visual data using a generative artificial intelligence algorithm" refers to an element within a system that generates image data using artificial intelligence technology based on user conditions.

[0265] "Optimization means for optimizing generated visual data and converting it into a variable form" refers to a function within a system that improves the quality and format of the generated image, making it usable by the user effectively.

[0266] "Means of delivery" refers to components within a system that have the function of delivering the generated visual data to the user.

[0267] A "constituent" is a collection of multiple functional means that cooperate to form the entire system.

[0268] One embodiment of the present invention provides a system that allows users to intuitively generate and utilize visual information. This system functions by allowing users to input the desired image conditions using a smartphone or other device. The input conditions include keywords, image style, color, and other detailed instructions.

[0269] The server processes this input and uses generative artificial intelligence algorithms (e.g., models built on PyTorch or TensorFlow) to generate visual data based on specified conditions. This image generation process is achieved by recognizing patterns based on a vast amount of training data and generating new visual content.

[0270] The generated visual data is processed by optimization techniques, including compression and format conversion. This allows image files to be displayed and downloaded smoothly on the user's device. Optimization techniques are applied to reduce file size while maintaining image quality.

[0271] The optimized images are delivered to the user through the distribution method, and the user can view them on a smartphone application and download or use them further as needed.

[0272] For example, if a user specifies "spring cherry blossom scenery" within the application, the server will generate an image that conforms to that theme. The generation AI model may use a prompt such as, "Create a vivid image of spring cherry blossoms under a clear blue sky." This allows users to easily obtain high-quality seasonal theme images and utilize them in content creation.

[0273] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0274] Step 1:

[0275] The user launches the application on their smartphone and specifies the details of the image they want to generate on the conditions input screen. This includes elements such as keywords, style, and color. The entered information is temporarily stored as data on the device.

[0276] Step 2:

[0277] The terminal sends the temporarily stored condition data to the server. The server receives this data and starts analysis. In the analysis, data processing is performed to form an appropriate prompt sentence based on the specified conditions. For example, when "the scenery of cherry blossoms in spring" is specified, a prompt sentence like "Create a vivid image of spring cherry blossoms under a clear blue sky." is generated.

[0278] Step 3:

[0279] The server calls the generative artificial intelligence model based on the generated prompt sentence. The AI model (e.g., Stable Diffusion) receives the prompt sentence and generates related visual information. In this process, the model utilizes existing data patterns and outputs new image data.

[0280] Step 4:

[0281] The generated image data is passed to the optimization means within the server. The optimization means adjusts the image size by compressing the image and converting it to an appropriate format, and processes it so that the display on the user's terminal is optimized. This processing improves the transfer efficiency and maintains the quality.

[0282] Step 5:

[0283] The optimized visual data is resent from the server to the terminal. The terminal receives this data and displays the image within the application. The user can view the displayed image and, if necessary, download the image or integrate it with other content.

[0284] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion identification model 59 and perform specific processing using the user's emotion.

[0285] The present invention is designed as an image generation system that takes into account the emotions of users. The system first starts when the user inputs the conditions for image generation via a terminal. When the user specifies keywords or content, the terminal incorporates an emotion engine that recognizes the user's emotions through voice input and text analysis. Through this emotion analysis, emotional nuances such as "cheerful" or "calm" are added to the conditions, for example.

[0286] The terminal structures these input data and sends them to the server. The server analyzes the received data and prepares optimal parameters for image generation based on the conditions and emotions. Next, the generation means in the server uses a pre-trained generation artificial intelligence model to generate an original image. At this time, parameters based on the user's emotions are also considered and efforts are made to reflect them in the color tone and composition of the image.

[0287] The generated image is further adjusted emotionally by the server. For example, when a cheerful mood is detected, bright color tones and dynamic compositions are selected. Also, the image is appropriately resized through compression processing and format conversion. Then, the server sends the optimized data back to the terminal.

[0288] The terminal displays the received image on the user interface, and the user can directly view it and use it for materials or download it as needed. As a specific example, when the user wishes for a "relaxing scenery" and the emotion is recognized as "calm and serene" as a result of the analysis, the AI model generates an image that evokes a calm sea and a gentle sunset and provides it to the user. In this form, the user can easily obtain appropriate images according to their psychological state.

[0289] The processing flow will be described below.

[0290] Step 1:

[0291] The user uses the input interface on the device to specify the conditions for the image they want to generate. This can include keywords or desired themes. Furthermore, when using voice or text input, the device's built-in emotion engine analyzes the user's voice tone and the words they use to detect their current emotional state.

[0292] Step 2:

[0293] The terminal forms a dataset containing the conditions entered by the user and sentiment information recognized by the sentiment engine, and sends this dataset to the server. This data includes information related to sentiment in addition to the conditions specified by the user.

[0294] Step 3:

[0295] The server analyzes the received data and adjusts the detailed parameters for image generation based on the conditions and emotional information specified by the user. The server uses the emotional information to determine how to reflect it in the image's color tone and composition.

[0296] Step 4:

[0297] The server uses a generation mechanism and a highly trained generative artificial intelligence model to create new images. In this process, image generation is performed based on prepared parameters to reflect the user's emotions in terms of color tone and composition.

[0298] Step 5:

[0299] The server then performs further optimizations on the generated images. In particular, after making fine adjustments based on emotional information, it compresses the images and converts the format to reduce data volume and improve transfer efficiency.

[0300] Step 6:

[0301] The optimized image is sent from the server to the terminal. The terminal receives this data and displays the image to the user through the user interface.

[0302] Step 7:

[0303] The user can view the displayed image and, if necessary, add it to the materials or download it to the device. Through this process, the user can intuitively and quickly obtain images that match their emotional state.

[0304] (Example 2)

[0305] Next, Example 2 will be described. In the following description, the data processing device 12 is referred to as a "server", and the smart glasses 214 are referred to as a "terminal".

[0306] In the conventional image generation system, the emotions of the user cannot be fully considered, and there is a problem that the resulting generated image does not match the user's expectations or psychological state. Therefore, there is an issue that there are limitations in the quality and satisfaction of the generated images.

[0307] The specific processing by the specific processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0308] In this invention, the server includes means for receiving image generation conditions and emotional information from the user via the terminal, means for analyzing the input conditions and emotional information to set optimal parameters for generation, means for generating an image using a generation artificial intelligence model based on the parameters, and means for performing emotional adjustment on the generated image for optimization. Thereby, high-quality image generation according to the user's emotional state becomes possible.

[0309] The "conditions" are information that specifically indicates the attributes and features that the user desires when performing image generation.

[0310] The "emotional information" is emotional data that reflects the user's psychological state and is extracted from the input voice or text.

[0311] A "generative artificial intelligence model" is a type of artificial intelligence structure that uses pre-trained algorithms to generate images.

[0312] "Optimal parameters" are settings adjusted to reflect conditions and emotional information in image generation, thereby improving the accuracy and quality of the generation process.

[0313] "Emotional adjustment" is a process that modifies the color tone and composition of a generated image, taking emotional information into consideration.

[0314] "Optimization" refers to processing generated images so that they meet user expectations and can be displayed and downloaded quickly and smoothly.

[0315] Embodiments for this invention are shown below.

[0316] First, the user uses the device to input the desired image criteria. This includes keywords and specific images, such as "calm landscape" or "dynamic urban scene." The device is equipped with an emotion recognition engine that analyzes the user's input through voice and text to extract the user's emotional information. From this information, emotional nuances such as "relaxed" or "lively" can be obtained.

[0317] Next, the device sends the generated dataset to the server. The server analyzes this data and sets the optimal parameters. This setting uses a pre-trained generative artificial intelligence model to generate images based on conditions and sentiment information. The generated images are then emotionally adjusted in terms of color and composition to reflect the user's emotional state.

[0318] The generated images are compressed and format-converted by the server, optimized for the user's device, and then sent. The device displays the received optimized image in its user interface, allowing the user to review the provided image. This image can be downloaded if needed.

[0319] For example, if a user inputs "a relaxing landscape," the device analyzes it based on emotional information such as "calm and peaceful." This data is sent to a server, where a generative AI model is used to generate images of a calm sea or a soft sunset. These images are then provided to the user.

[0320] An example of a prompt might be, "Generate a relaxing landscape. Sentiment analysis identifies it as 'calm and peaceful'." Based on this prompt, the system can provide an image suitable for the user.

[0321] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0322] Step 1:

[0323] The user inputs image generation conditions into the device. This input includes keywords and desired images, such as "relaxing scenery." The device receives these conditions and uses an emotion recognition engine to extract the user's emotional information from the input data. The input is text or audio data, and the output is information about the analyzed emotional state.

[0324] Step 2:

[0325] The terminal converts the input conditions and sentiment information into a structured dataset as a preprocessing step and sends it to the server. The dataset contains detailed conditions and sentiment information, which facilitates subsequent processing. The input for this step is the user's conditions and extracted sentiment information, and the output is a structured dataset for transmission to the server.

[0326] Step 3:

[0327] The server analyzes the received dataset and sets the optimal parameters for image generation. Specifically, it determines parameters such as color, style, and composition to be used based on the information. The input for this step is the dataset received from the terminal, and the output is the optimized parameters for image generation.

[0328] Step 4:

[0329] The server-based generation mechanism generates images using a generative artificial intelligence model based on optimized parameters. This model is pre-trained and capable of generating a variety of images. The input is the optimized parameters, and the output is the generated original image.

[0330] Step 5:

[0331] The server applies emotional adjustments to the generated image. These adjustments modify the image's color tone, brightness, and composition based on the user's emotional information. For example, if a calm emotion is detected, a gentle color tone will be selected. The input is the generated image and emotional information, and the output is the adjusted image.

[0332] Step 6:

[0333] The server compresses and converts the format of the adjusted image and sends the optimized data to the terminal. This process prepares the data for smooth image display and download. The input is the adjusted image, and the output is the compressed and format-converted image data.

[0334] Step 7:

[0335] The terminal displays the received image data to the user. The user can view the displayed image and, if necessary, download it or use it in other materials. The input is the image data received from the server, and the output is the image displayed on the user interface.

[0336] (Application Example 2)

[0337] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0338] Conventional advertising image generation systems were limited to generating images based solely on conditions, making it difficult to create flexible images that took into account the emotional state of the user. In particular, providing visual content that resonates with the user's emotions is crucial in the advertising field, but this has not been adequately achieved. This has resulted in the challenge of not maximizing the effectiveness of advertising.

[0339] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0340] In this invention, the server includes a receiving means for receiving conditions from the user, a generating means for analyzing the user's emotional state and generating an image using a computer model, and a providing means for presenting the generated image on the user's digital device. This makes it possible to efficiently generate and provide appropriate advertising images according to the user's emotional state.

[0341] A "user" is someone who provides the conditions for image generation through this system and receives the generated image.

[0342] "Conditions" refer to the specific requirements or desired elements that the user specifies regarding the images they generate.

[0343] "Emotional state" refers to the psychological state of a user, and is identified by analyzing information obtained from voice and text.

[0344] A "computer model" is a system that includes algorithms executed on a digital device and is used to incorporate conditions and emotional states in image generation.

[0345] "Digital devices" refer to electronic devices used to display generated images, such as smartphones, tablets, and personal computers.

[0346] "Generation method" refers to a series of processes for generating images by utilizing a computer model based on conditions and emotional states.

[0347] "Means of provision" refers to means including interfaces and digital devices for presenting the generated images to the user.

[0348] The system that realizes this invention aims to effectively provide visual advertising content by generating images that correspond to the user's emotional state and presenting them on a digital device. A specific example of this system is shown below.

[0349] The system's program runs on the user's smartphone or tablet device and provides an interface for the user to input criteria for advertising images. These criteria include information such as the type of product or atmosphere the user expects to see in the images. These criteria are then transmitted from the digital device to the server.

[0350] The server receives conditions sent by the user and uses Google Cloud Speech-to-Text or Google Cloud Natural Language API to obtain the user's emotional state from speech or text. Based on this analysis, the Stable Diffusion generative AI model is used to establish image generation parameters that are fused with the conditions.

[0351] If a user wants to generate an image that evokes a sense of peace, the emotion analysis will identify a state of "relaxation." Based on this, the server generates an advertising image with calming colors and composition.

[0352] The server then sends the generated image to the digital device, allowing the user to view it on their device. This image has been compressed, formatted, and resized as appropriate.

[0353] For example, if a user sets a "scene that feels comfortable" in their daily life, the generated ad visuals are likely to depict a relaxed beach or soft sunlight. Prompts such as "Please create an ad visual that evokes a sense of peace" are supported.

[0354] This format allows advertisers and content creators to create and distribute more engaging and effective advertisements and visual content that are tailored to the target audience's psychological state.

[0355] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0356] Step 1:

[0357] The user uses their device to input the desired conditions for the advertisement image into the interface. Specifically, they provide the product name and desired mood (e.g., calm, cheerful) in text or voice. This information is collected as input data on the device.

[0358] Step 2:

[0359] The terminal sends the conditions collected in Step 1 to the server. If there is voice input, the terminal performs preprocessing, such as converting the voice data to text, and formats it into a format that the server can easily handle before sending it. This data becomes the server input.

[0360] Step 3:

[0361] The server performs sentiment analysis on the received text data using the Google Cloud Speech-to-Text and Google Cloud Natural Language APIs. Through this analysis, the server extracts the user's emotional state in terms of "relaxed" or "excited," and outputs this as sentiment data.

[0362] Step 4:

[0363] The server combines emotion data with user-defined conditions to set optimal parameters for image generation. These settings include prompts used by the AI ​​image generation model (e.g., Stable Diffusion), such as color tone and composition.

[0364] Step 5:

[0365] The server runs a generative AI model using the configured parameters to generate an image. The generated image is a computer-generated image whose brightness and hue are adjusted according to the user's emotional state.

[0366] Step 6:

[0367] The generated images are compressed and formatted on the server and optimized for transmission to the terminal. This is to adjust the size and format so that they can be displayed more quickly on digital devices.

[0368] Step 7:

[0369] The terminal receives optimized image data from the server and displays the images through the user interface. The user can view these generated images and, if necessary, download or use them for further processing.

[0370] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0371] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0372] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0373] [Third Embodiment]

[0374] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0375] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0376] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0377] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0378] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0379] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0380] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0381] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0382] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0383] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0384] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0385] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0386] This invention is designed as a system that users can use intuitively. Users access the system using their own devices and request image generation based on specific conditions. The device reads the requests entered by the user and sends them to the server as digital data. This data includes keywords, image style, color, and other detailed conditions.

[0387] The server analyzes the received request and defines the criteria for the image to be generated. The generation mechanism operating within the server uses a generative artificial intelligence model to generate images based on these criteria. In this process, machine learning models and neural networks are commonly used. The AI ​​model recognizes patterns from a vast amount of training data and uses them to create new images.

[0388] The generated images are optimized by the server. This involves image compression and format conversion, adjusting the file size appropriately. In this way, the images are prepared for smooth transfer and display. Finally, the optimized images are sent back to the terminal by the delivery method and presented to the user.

[0389] The device displays the received images on its interface, allowing users to easily download or use them directly. For example, if a user requests a "landscape rich in nature," the AI ​​model generates a natural landscape image reflecting that request and provides it to the user. This process allows the user to instantly obtain original images suited to their purpose and use them in presentations or projects.

[0390] The following describes the processing flow.

[0391] Step 1:

[0392] The user uses their device to input the conditions for image generation. This includes keywords, desired style, color scheme, etc. The user interface is designed to retrieve this information.

[0393] Step 2:

[0394] The terminal converts the user's input data into an appropriate data format (e.g., JSON) and sends a request to the server. This often uses the HTTP protocol.

[0395] Step 3:

[0396] The server analyzes the received request and extracts the conditions and context necessary for image generation. This may include text analysis using natural language processing.

[0397] Step 4:

[0398] Based on the analysis results, the server adjusts the parameters for the generated images and provides them to the AI ​​model as input data.

[0399] Step 5:

[0400] The generation mechanism on the server uses a generative artificial intelligence model to generate original images based on these parameters. The AI ​​model creates images by utilizing pre-trained data.

[0401] Step 6:

[0402] The generated images are compressed and format-converted by the server to optimize file size. This process reduces the load during transfer and improves display speed on the device.

[0403] Step 7:

[0404] The optimized image data is sent from the server to the terminal. This is also done via an HTTP response.

[0405] Step 8:

[0406] The device displays the received image in the user interface. The user can then download this image or use it in their materials.

[0407] (Example 1)

[0408] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0409] The present invention addresses the need for an image generation system that is intuitive for users to use, capable of rapidly generating and efficiently providing high-quality images based on specified conditions. In particular, it requires optimal image generation to meet diverse user needs, as well as appropriate optimization for smooth image transfer and display.

[0410] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0411] In this invention, the server includes means for receiving conditions from an input means and transmitting them as digital data, means for analyzing the conditions and generating an image using a generative artificial intelligence model, and means for optimizing the generated image to adjust its size and format and providing it to the user terminal. This makes it possible to respond quickly to various user requests and provide images in an optimal state.

[0412] The "input mechanism" refers to the part that has the function of receiving conditions and requests from the user.

[0413] "Conditions" refer to information that allows the user to specify the characteristics and style of the image they wish to generate.

[0414] "Digital data" refers to information that has been converted into a format that can be processed by a computer, such as conditions or requests.

[0415] The "analysis means" refers to the part that analyzes the conditions received from the user and has the function of preparing the criteria and context necessary for image generation.

[0416] A "generative artificial intelligence model" is a model that uses machine learning algorithms to learn from large amounts of data and generate new images.

[0417] "Image generation" is the process of creating a new image based on conditions received from the user.

[0418] "Optimization processing" is the process of compressing and converting the format of generated images to adjust them into a state that allows for efficient storage and transfer.

[0419] "Delivery means" refers to the part that has the function of sending optimized images to the user's device and making them available for the user to use.

[0420] This invention is a system for efficiently generating and providing high-quality images based on user input of image generation conditions. Users access the system using a personal computer or smartphone and input specific conditions for the image they wish to generate. This involves specifying information such as keywords, style, and color through prompt messages.

[0421] The terminal converts the input conditions into digital data and sends it to the server. The server receives this data, analyzes the user's request in detail using analytical tools, and generates the desired image using a generative AI model. The generative AI model generates new images in real time using patterns learned from a large amount of training data. The AI ​​model used in this process is often built on machine learning frameworks such as nn libraries or TensorFlow.

[0422] The generated images undergo optimization processing on the server, including compression and format conversion, to ensure efficient transfer and display. Finally, the server sends this optimized image back to the terminal, which then displays it on the user interface. This allows users to immediately download the generated images for use in the Download Project.

[0423] For example, if a user enters the prompt "a landscape rich in nature," the server's AI model generates a natural landscape image that reflects this prompt and provides it to the user. This allows the user to quickly obtain original images that are suitable for presentation materials or web content.

[0424] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0425] Step 1:

[0426] The user accesses the system using a terminal and enters the conditions for image generation. The user specifies keywords, style, and color of the image to be generated as prompts. The entered information is processed as digital data within the terminal and sent to the server. This step accurately converts the user's specific requests into data.

[0427] Step 2:

[0428] The server receives digital data transmitted from the terminal. The received data is analyzed using analytical tools to establish the necessary criteria and context for image generation. Through this analysis, specific image generation conditions are set based on the input prompt text. In this step, data processing is performed to understand the user's requirements and create an appropriate configuration.

[0429] Step 3:

[0430] The server-side generation mechanism uses a generative AI model to generate images based on the analyzed conditions. The AI ​​model utilizes a large amount of training data to create new images that match the input conditions. The generative AI model is implemented using a machine learning framework and leverages pattern recognition capabilities to generate images. This step involves the specific execution of complex image generation algorithms.

[0431] Step 4:

[0432] The generated images are optimized on the server. This process involves image compression and format conversion, adjusting them for efficient transfer and display. The optimized images are converted to the optimal file size and format for transfer. In this step, data calculations are performed to make the generated images usable.

[0433] Step 5:

[0434] The server sends the optimized image to the terminal. The terminal displays the received image on its user interface, making it easy for the user to download and use. This step involves providing information in a user-friendly format.

[0435] (Application Example 1)

[0436] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0437] In generating visual information, it is essential that users can intuitively set conditions, and that the generated images are high-quality and efficiently optimized. Furthermore, the generated visual information must be delivered smoothly to the user, enhancing the personalized content experience.

[0438] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0439] In this invention, the server includes a receiving means for receiving conditions from a user, a generating means for generating visual data using a generation artificial intelligence algorithm based on the conditions, and an optimization means for optimizing the generated visual data and converting it into a variable form. This makes it possible to quickly and efficiently generate and provide the visual information requested by the user.

[0440] A "means of receiving conditions from users" refers to a component within a system that receives specific requests or wishes from users.

[0441] "Generative means for generating visual data using a generative artificial intelligence algorithm" refers to an element within a system that generates image data using artificial intelligence technology based on user conditions.

[0442] "Optimization means for optimizing generated visual data and converting it into a variable form" refers to a function within a system that improves the quality and format of the generated image, making it usable by the user effectively.

[0443] "Means of delivery" refers to components within a system that have the function of delivering the generated visual data to the user.

[0444] A "constituent" is a collection of multiple functional means that cooperate to form the entire system.

[0445] One embodiment of the present invention provides a system that allows users to intuitively generate and utilize visual information. This system functions by allowing users to input the desired image conditions using a smartphone or other device. The input conditions include keywords, image style, color, and other detailed instructions.

[0446] The server processes this input and uses generative artificial intelligence algorithms (e.g., models built on PyTorch or TensorFlow) to generate visual data based on specified conditions. This image generation process is achieved by recognizing patterns based on a vast amount of training data and generating new visual content.

[0447] The generated visual data is processed by optimization techniques, including compression and format conversion. This allows image files to be displayed and downloaded smoothly on the user's device. Optimization techniques are applied to reduce file size while maintaining image quality.

[0448] The optimized images are delivered to the user through the distribution method, and the user can view them on a smartphone application and download or use them further as needed.

[0449] For example, if a user specifies "spring cherry blossom scenery" within the application, the server will generate an image that conforms to that theme. The generation AI model may use a prompt such as, "Create a vivid image of spring cherry blossoms under a clear blue sky." This allows users to easily obtain high-quality seasonal theme images and utilize them in content creation.

[0450] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0451] Step 1:

[0452] The user launches the application on their smartphone and specifies the details of the image they want to generate on the conditions input screen. This includes elements such as keywords, style, and color. The entered information is temporarily stored as data on the device.

[0453] Step 2:

[0454] The terminal sends temporarily stored condition data to the server. The server receives this data and begins analysis. During analysis, data processing is performed to form an appropriate prompt sentence based on the specified conditions. For example, if "spring cherry blossom scenery" is specified, the prompt sentence "Create a vivid image of spring cherry blossoms under a clear blue sky." will be generated.

[0455] Step 3:

[0456] The server invokes a generative artificial intelligence model based on the generated prompt text. The AI ​​model (e.g., Stable Diffusion) receives the prompt text and generates relevant visual information. In this process, the model utilizes existing data patterns to output new image data.

[0457] Step 4:

[0458] The generated image data is passed to an optimization mechanism on the server. This mechanism compresses the image, converts it to an appropriate format, adjusts the image size, and processes it to optimize display on the user's device. This process improves transfer efficiency while maintaining quality.

[0459] Step 5:

[0460] The optimized visual data is resent from the server to the device. The device receives this data and displays the image within the application. The user can review the displayed image and, if necessary, download it or integrate it with other content.

[0461] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0462] This invention is designed as an image generation system that takes user emotions into account. The system begins with the user inputting image generation conditions via a terminal. When the user specifies keywords and content, the terminal has a built-in emotion engine that recognizes the user's emotions through voice input and text analysis. This emotion analysis adds emotional nuances, such as "cheerful" or "calm," to the conditions.

[0463] The terminal structures this input data and sends it to the server. The server analyzes the received data and prepares the optimal parameters for image generation based on the conditions and emotions. Next, the generation mechanism within the server generates an original image using a pre-trained generative artificial intelligence model. In this process, parameters based on the user's emotions are also taken into consideration and efforts are made to reflect them in the image's color tone and composition.

[0464] The generated images are further emotionally adjusted by the server. For example, if a cheerful mood is detected, brighter colors and a more dynamic composition are selected. The images are also compressed, converted to different formats, and resized appropriately. The server then sends the optimized data back to the terminal.

[0465] The device displays received images on the user interface, allowing users to view them directly and use or download them as needed. For example, if a user requests a "relaxing landscape" and the analysis determines their emotions are "calm and peaceful," the AI ​​model generates and provides images reminiscent of a calm sea or a soft sunset. This format allows users to easily obtain images appropriate to their psychological state.

[0466] The following describes the processing flow.

[0467] Step 1:

[0468] The user uses the input interface on the device to specify the conditions for the image they want to generate. This can include keywords or desired themes. Furthermore, when using voice or text input, the device's built-in emotion engine analyzes the user's voice tone and the words they use to detect their current emotional state.

[0469] Step 2:

[0470] The terminal forms a dataset containing the conditions entered by the user and sentiment information recognized by the sentiment engine, and sends this dataset to the server. This data includes information related to sentiment in addition to the conditions specified by the user.

[0471] Step 3:

[0472] The server analyzes the received data and adjusts the detailed parameters for image generation based on the conditions and emotional information specified by the user. The server uses the emotional information to determine how to reflect it in the image's color tone and composition.

[0473] Step 4:

[0474] The server uses a generation mechanism and a highly trained generative artificial intelligence model to create new images. In this process, image generation is performed based on prepared parameters to reflect the user's emotions in terms of color tone and composition.

[0475] Step 5:

[0476] The server then performs further optimizations on the generated images. In particular, after making fine adjustments based on emotional information, it compresses the images and converts the format to reduce data volume and improve transfer efficiency.

[0477] Step 6:

[0478] The optimized image is sent from the server to the terminal. The terminal receives this data and displays the image to the user through the user interface.

[0479] Step 7:

[0480] Users can view the displayed images and add them to their materials or download them to their devices as needed. This process allows users to intuitively and quickly obtain images that match their emotional state.

[0481] (Example 2)

[0482] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0483] Conventional image generation systems have been unable to adequately consider the user's emotions, resulting in the problem that the generated images do not match the user's expectations or psychological state. Therefore, there were limitations in the quality and satisfaction level of the generated images.

[0484] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0485] In this invention, the server includes means for receiving image generation conditions and emotional information from the user via a terminal, means for analyzing the input conditions and emotional information and setting optimal parameters for generation, means for generating an image using a generation artificial intelligence model based on the parameters, and means for optimizing the generated image by applying emotional adjustments. This makes it possible to generate high-quality images that are in line with the user's emotional state.

[0486] "Conditions" refer to information that specifically indicates the attributes and characteristics that the user desires when generating an image.

[0487] "Emotional information" refers to emotional data extracted from input voice and text that reflects the user's psychological state.

[0488] A "generative artificial intelligence model" is a type of artificial intelligence structure that uses pre-trained algorithms to generate images.

[0489] "Optimal parameters" are settings adjusted to reflect conditions and emotional information in image generation, thereby improving the accuracy and quality of the generation process.

[0490] "Emotional adjustment" is a process that modifies the color tone and composition of a generated image, taking emotional information into consideration.

[0491] "Optimization" refers to processing generated images so that they meet user expectations and can be displayed and downloaded quickly and smoothly.

[0492] Embodiments for this invention are shown below.

[0493] First, the user uses the device to input the desired image criteria. This includes keywords and specific images, such as "calm landscape" or "dynamic urban scene." The device is equipped with an emotion recognition engine that analyzes the user's input through voice and text to extract the user's emotional information. From this information, emotional nuances such as "relaxed" or "lively" can be obtained.

[0494] Next, the device sends the generated dataset to the server. The server analyzes this data and sets the optimal parameters. This setting uses a pre-trained generative artificial intelligence model to generate images based on conditions and sentiment information. The generated images are then emotionally adjusted in terms of color and composition to reflect the user's emotional state.

[0495] The generated images are compressed and format-converted by the server, optimized for the user's device, and then sent. The device displays the received optimized image in its user interface, allowing the user to review the provided image. This image can be downloaded if needed.

[0496] For example, if a user inputs "a relaxing landscape," the device analyzes it based on emotional information such as "calm and peaceful." This data is sent to a server, where a generative AI model is used to generate images of a calm sea or a soft sunset. These images are then provided to the user.

[0497] An example of a prompt might be, "Generate a relaxing landscape. Sentiment analysis identifies it as 'calm and peaceful'." Based on this prompt, the system can provide an image suitable for the user.

[0498] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0499] Step 1:

[0500] The user inputs image generation conditions into the device. This input includes keywords and desired images, such as "relaxing scenery." The device receives these conditions and uses an emotion recognition engine to extract the user's emotional information from the input data. The input is text or audio data, and the output is information about the analyzed emotional state.

[0501] Step 2:

[0502] The terminal converts the input conditions and sentiment information into a structured dataset as a preprocessing step and sends it to the server. The dataset contains detailed conditions and sentiment information, which facilitates subsequent processing. The input for this step is the user's conditions and extracted sentiment information, and the output is a structured dataset for transmission to the server.

[0503] Step 3:

[0504] The server analyzes the received dataset and sets the optimal parameters for image generation. Specifically, it determines parameters such as color, style, and composition to be used based on the information. The input for this step is the dataset received from the terminal, and the output is the optimized parameters for image generation.

[0505] Step 4:

[0506] The server-based generation mechanism generates images using a generative artificial intelligence model based on optimized parameters. This model is pre-trained and capable of generating a variety of images. The input is the optimized parameters, and the output is the generated original image.

[0507] Step 5:

[0508] The server applies emotional adjustments to the generated image. These adjustments modify the image's color tone, brightness, and composition based on the user's emotional information. For example, if a calm emotion is detected, a gentle color tone will be selected. The input is the generated image and emotional information, and the output is the adjusted image.

[0509] Step 6:

[0510] The server compresses and converts the format of the adjusted image and sends the optimized data to the terminal. This process prepares the data for smooth image display and download. The input is the adjusted image, and the output is the compressed and format-converted image data.

[0511] Step 7:

[0512] The terminal displays the received image data to the user. The user can view the displayed image and, if necessary, download it or use it in other materials. The input is the image data received from the server, and the output is the image displayed on the user interface.

[0513] (Application Example 2)

[0514] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0515] Conventional advertising image generation systems were limited to generating images based solely on conditions, making it difficult to create flexible images that took into account the emotional state of the user. In particular, providing visual content that resonates with the user's emotions is crucial in the advertising field, but this has not been adequately achieved. This has resulted in the challenge of not maximizing the effectiveness of advertising.

[0516] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0517] In this invention, the server includes a receiving means for receiving conditions from the user, a generating means for analyzing the user's emotional state and generating an image using a computer model, and a providing means for presenting the generated image on the user's digital device. This makes it possible to efficiently generate and provide appropriate advertising images according to the user's emotional state.

[0518] A "user" is someone who provides the conditions for image generation through this system and receives the generated image.

[0519] "Conditions" refer to the specific requirements or desired elements that the user specifies regarding the images they generate.

[0520] "Emotional state" refers to the psychological state of a user, and is identified by analyzing information obtained from voice and text.

[0521] A "computer model" is a system that includes algorithms executed on a digital device and is used to incorporate conditions and emotional states in image generation.

[0522] "Digital devices" refer to electronic devices used to display generated images, such as smartphones, tablets, and personal computers.

[0523] "Generation method" refers to a series of processes for generating images by utilizing a computer model based on conditions and emotional states.

[0524] "Means of provision" refers to means including interfaces and digital devices for presenting the generated images to the user.

[0525] The system that realizes this invention aims to effectively provide visual advertising content by generating images that correspond to the user's emotional state and presenting them on a digital device. A specific example of this system is shown below.

[0526] The system's program runs on the user's smartphone or tablet device and provides an interface for the user to input criteria for advertising images. These criteria include information such as the type of product or atmosphere the user expects to see in the images. These criteria are then transmitted from the digital device to the server.

[0527] The server receives conditions sent by the user and uses Google Cloud Speech-to-Text or Google Cloud Natural Language API to obtain the user's emotional state from speech or text. Based on this analysis, the Stable Diffusion generative AI model is used to establish image generation parameters that are fused with the conditions.

[0528] If a user wants to generate an image that evokes a sense of peace, the emotion analysis will identify a state of "relaxation." Based on this, the server generates an advertising image with calming colors and composition.

[0529] The server then sends the generated image to the digital device, allowing the user to view it on their device. This image has been compressed, formatted, and resized as appropriate.

[0530] For example, if a user sets a "scene that feels comfortable" in their daily life, the generated ad visuals are likely to depict a relaxed beach or soft sunlight. Prompts such as "Please create an ad visual that evokes a sense of peace" are supported.

[0531] This format allows advertisers and content creators to create and distribute more engaging and effective advertisements and visual content that are tailored to the target audience's psychological state.

[0532] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0533] Step 1:

[0534] The user uses their device to input the desired conditions for the advertisement image into the interface. Specifically, they provide the product name and desired mood (e.g., calm, cheerful) in text or voice. This information is collected as input data on the device.

[0535] Step 2:

[0536] The terminal sends the conditions collected in Step 1 to the server. If there is voice input, the terminal performs preprocessing, such as converting the voice data to text, and formats it into a format that the server can easily handle before sending it. This data becomes the server input.

[0537] Step 3:

[0538] The server performs sentiment analysis on the received text data using the Google Cloud Speech-to-Text and Google Cloud Natural Language APIs. Through this analysis, the server extracts the user's emotional state in terms of "relaxed" or "excited," and outputs this as sentiment data.

[0539] Step 4:

[0540] The server combines emotion data with user-defined conditions to set optimal parameters for image generation. These settings include prompts used by the AI ​​image generation model (e.g., Stable Diffusion), such as color tone and composition.

[0541] Step 5:

[0542] The server runs a generative AI model using the configured parameters to generate an image. The generated image is a computer-generated image whose brightness and hue are adjusted according to the user's emotional state.

[0543] Step 6:

[0544] The generated images are compressed and formatted on the server and optimized for transmission to the terminal. This is to adjust the size and format so that they can be displayed more quickly on digital devices.

[0545] Step 7:

[0546] The terminal receives optimized image data from the server and displays the images through the user interface. The user can view these generated images and, if necessary, download or use them for further processing.

[0547] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0548] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0549] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0550] [Fourth Embodiment]

[0551] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0552] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0553] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0554] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0555] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0556] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0557] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0558] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0559] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0560] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0561] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0562] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0563] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0564] This invention is designed as a system that users can use intuitively. Users access the system using their own devices and request image generation based on specific conditions. The device reads the requests entered by the user and sends them to the server as digital data. This data includes keywords, image style, color, and other detailed conditions.

[0565] The server analyzes the received request and defines the criteria for the image to be generated. The generation mechanism operating within the server uses a generative artificial intelligence model to generate images based on these criteria. In this process, machine learning models and neural networks are commonly used. The AI ​​model recognizes patterns from a vast amount of training data and uses them to create new images.

[0566] The generated images are optimized by the server. This involves image compression and format conversion, adjusting the file size appropriately. In this way, the images are prepared for smooth transfer and display. Finally, the optimized images are sent back to the terminal by the delivery method and presented to the user.

[0567] The device displays the received images on its interface, allowing users to easily download or use them directly. For example, if a user requests a "landscape rich in nature," the AI ​​model generates a natural landscape image reflecting that request and provides it to the user. This process allows the user to instantly obtain original images suited to their purpose and use them in presentations or projects.

[0568] The following describes the processing flow.

[0569] Step 1:

[0570] The user uses their device to input the conditions for image generation. This includes keywords, desired style, color scheme, etc. The user interface is designed to retrieve this information.

[0571] Step 2:

[0572] The terminal converts the user's input data into an appropriate data format (e.g., JSON) and sends a request to the server. This often uses the HTTP protocol.

[0573] Step 3:

[0574] The server analyzes the received request and extracts the conditions and context necessary for image generation. This may include text analysis using natural language processing.

[0575] Step 4:

[0576] Based on the analysis results, the server adjusts the parameters for the generated images and provides them to the AI ​​model as input data.

[0577] Step 5:

[0578] The generation mechanism on the server uses a generative artificial intelligence model to generate original images based on these parameters. The AI ​​model creates images by utilizing pre-trained data.

[0579] Step 6:

[0580] The generated images are compressed and format-converted by the server to optimize file size. This process reduces the load during transfer and improves display speed on the device.

[0581] Step 7:

[0582] The optimized image data is sent from the server to the terminal. This is also done via an HTTP response.

[0583] Step 8:

[0584] The device displays the received image in the user interface. The user can then download this image or use it in their materials.

[0585] (Example 1)

[0586] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0587] The present invention addresses the need for an image generation system that is intuitive for users to use, capable of rapidly generating and efficiently providing high-quality images based on specified conditions. In particular, it requires optimal image generation to meet diverse user needs, as well as appropriate optimization for smooth image transfer and display.

[0588] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0589] In this invention, the server includes means for receiving conditions from an input means and transmitting them as digital data, means for analyzing the conditions and generating an image using a generative artificial intelligence model, and means for optimizing the generated image to adjust its size and format and providing it to the user terminal. This makes it possible to respond quickly to various user requests and provide images in an optimal state.

[0590] The "input mechanism" refers to the part that has the function of receiving conditions and requests from the user.

[0591] "Conditions" refer to information that allows the user to specify the characteristics and style of the image they wish to generate.

[0592] "Digital data" refers to information that has been converted into a format that can be processed by a computer, such as conditions or requests.

[0593] The "analysis means" refers to the part that analyzes the conditions received from the user and has the function of preparing the criteria and context necessary for image generation.

[0594] A "generative artificial intelligence model" is a model that uses machine learning algorithms to learn from large amounts of data and generate new images.

[0595] "Image generation" is the process of creating a new image based on conditions received from the user.

[0596] "Optimization processing" is the process of compressing and converting the format of generated images to adjust them into a state that allows for efficient storage and transfer.

[0597] "Delivery means" refers to the part that has the function of sending optimized images to the user's device and making them available for the user to use.

[0598] This invention is a system for efficiently generating and providing high-quality images based on user input of image generation conditions. Users access the system using a personal computer or smartphone and input specific conditions for the image they wish to generate. This involves specifying information such as keywords, style, and color through prompt messages.

[0599] The terminal converts the input conditions into digital data and sends it to the server. The server receives this data, analyzes the user's request in detail using analytical tools, and generates the desired image using a generative AI model. The generative AI model generates new images in real time using patterns learned from a large amount of training data. The AI ​​model used in this process is often built on machine learning frameworks such as nn libraries or TensorFlow.

[0600] The generated images undergo optimization processing on the server, including compression and format conversion, to ensure efficient transfer and display. Finally, the server sends this optimized image back to the terminal, which then displays it on the user interface. This allows users to immediately download the generated images for use in the Download Project.

[0601] For example, if a user enters the prompt "a landscape rich in nature," the server's AI model generates a natural landscape image that reflects this prompt and provides it to the user. This allows the user to quickly obtain original images that are suitable for presentation materials or web content.

[0602] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0603] Step 1:

[0604] The user accesses the system using a terminal and enters the conditions for image generation. The user specifies keywords, style, and color of the image to be generated as prompts. The entered information is processed as digital data within the terminal and sent to the server. This step accurately converts the user's specific requests into data.

[0605] Step 2:

[0606] The server receives digital data transmitted from the terminal. The received data is analyzed using analytical tools to establish the necessary criteria and context for image generation. Through this analysis, specific image generation conditions are set based on the input prompt text. In this step, data processing is performed to understand the user's requirements and create an appropriate configuration.

[0607] Step 3:

[0608] The server-side generation mechanism uses a generative AI model to generate images based on the analyzed conditions. The AI ​​model utilizes a large amount of training data to create new images that match the input conditions. The generative AI model is implemented using a machine learning framework and leverages pattern recognition capabilities to generate images. This step involves the specific execution of complex image generation algorithms.

[0609] Step 4:

[0610] The generated images are optimized on the server. This process involves image compression and format conversion, adjusting them for efficient transfer and display. The optimized images are converted to the optimal file size and format for transfer. In this step, data calculations are performed to make the generated images usable.

[0611] Step 5:

[0612] The server sends the optimized image to the terminal. The terminal displays the received image on its user interface, making it easy for the user to download and use. This step involves providing information in a user-friendly format.

[0613] (Application Example 1)

[0614] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0615] In generating visual information, it is essential that users can intuitively set conditions, and that the generated images are high-quality and efficiently optimized. Furthermore, the generated visual information must be delivered smoothly to the user, enhancing the personalized content experience.

[0616] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0617] In this invention, the server includes a receiving means for receiving conditions from a user, a generating means for generating visual data using a generation artificial intelligence algorithm based on the conditions, and an optimization means for optimizing the generated visual data and converting it into a variable form. This makes it possible to quickly and efficiently generate and provide the visual information requested by the user.

[0618] A "means of receiving conditions from users" refers to a component within a system that receives specific requests or wishes from users.

[0619] "Generative means for generating visual data using a generative artificial intelligence algorithm" refers to an element within a system that generates image data using artificial intelligence technology based on user conditions.

[0620] "Optimization means for optimizing generated visual data and converting it into a variable form" refers to a function within a system that improves the quality and format of the generated image, making it usable by the user effectively.

[0621] "Means of delivery" refers to components within a system that have the function of delivering the generated visual data to the user.

[0622] A "constituent" is a collection of multiple functional means that cooperate to form the entire system.

[0623] One embodiment of the present invention provides a system that allows users to intuitively generate and utilize visual information. This system functions by allowing users to input the desired image conditions using a smartphone or other device. The input conditions include keywords, image style, color, and other detailed instructions.

[0624] The server processes this input and uses generative artificial intelligence algorithms (e.g., models built on PyTorch or TensorFlow) to generate visual data based on specified conditions. This image generation process is achieved by recognizing patterns based on a vast amount of training data and generating new visual content.

[0625] The generated visual data is processed by optimization techniques, including compression and format conversion. This allows image files to be displayed and downloaded smoothly on the user's device. Optimization techniques are applied to reduce file size while maintaining image quality.

[0626] The optimized images are delivered to the user through the distribution method, and the user can view them on a smartphone application and download or use them further as needed.

[0627] For example, if a user specifies "spring cherry blossom scenery" within the application, the server will generate an image that conforms to that theme. The generation AI model may use a prompt such as, "Create a vivid image of spring cherry blossoms under a clear blue sky." This allows users to easily obtain high-quality seasonal theme images and utilize them in content creation.

[0628] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0629] Step 1:

[0630] The user launches the application on their smartphone and specifies the details of the image they want to generate on the conditions input screen. This includes elements such as keywords, style, and color. The entered information is temporarily stored as data on the device.

[0631] Step 2:

[0632] The terminal sends temporarily stored condition data to the server. The server receives this data and begins analysis. During analysis, data processing is performed to form an appropriate prompt sentence based on the specified conditions. For example, if "spring cherry blossom scenery" is specified, the prompt sentence "Create a vivid image of spring cherry blossoms under a clear blue sky." will be generated.

[0633] Step 3:

[0634] The server invokes a generative artificial intelligence model based on the generated prompt text. The AI ​​model (e.g., Stable Diffusion) receives the prompt text and generates relevant visual information. In this process, the model utilizes existing data patterns to output new image data.

[0635] Step 4:

[0636] The generated image data is passed to an optimization mechanism on the server. This mechanism compresses the image, converts it to an appropriate format, adjusts the image size, and processes it to optimize display on the user's device. This process improves transfer efficiency while maintaining quality.

[0637] Step 5:

[0638] The optimized visual data is resent from the server to the device. The device receives this data and displays the image within the application. The user can review the displayed image and, if necessary, download it or integrate it with other content.

[0639] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0640] This invention is designed as an image generation system that takes user emotions into account. The system begins with the user inputting image generation conditions via a terminal. When the user specifies keywords and content, the terminal has a built-in emotion engine that recognizes the user's emotions through voice input and text analysis. This emotion analysis adds emotional nuances, such as "cheerful" or "calm," to the conditions.

[0641] The terminal structures this input data and sends it to the server. The server analyzes the received data and prepares the optimal parameters for image generation based on the conditions and emotions. Next, the generation mechanism within the server generates an original image using a pre-trained generative artificial intelligence model. In this process, parameters based on the user's emotions are also taken into consideration and efforts are made to reflect them in the image's color tone and composition.

[0642] The generated images are further emotionally adjusted by the server. For example, if a cheerful mood is detected, brighter colors and a more dynamic composition are selected. The images are also compressed, converted to different formats, and resized appropriately. The server then sends the optimized data back to the terminal.

[0643] The device displays received images on the user interface, allowing users to view them directly and use or download them as needed. For example, if a user requests a "relaxing landscape" and the analysis determines their emotions are "calm and peaceful," the AI ​​model generates and provides images reminiscent of a calm sea or a soft sunset. This format allows users to easily obtain images appropriate to their psychological state.

[0644] The following describes the processing flow.

[0645] Step 1:

[0646] The user uses the input interface on the device to specify the conditions for the image they want to generate. This can include keywords or desired themes. Furthermore, when using voice or text input, the device's built-in emotion engine analyzes the user's voice tone and the words they use to detect their current emotional state.

[0647] Step 2:

[0648] The terminal forms a dataset containing the conditions entered by the user and sentiment information recognized by the sentiment engine, and sends this dataset to the server. This data includes information related to sentiment in addition to the conditions specified by the user.

[0649] Step 3:

[0650] The server analyzes the received data and adjusts the detailed parameters for image generation based on the conditions and emotional information specified by the user. The server uses the emotional information to determine how to reflect it in the image's color tone and composition.

[0651] Step 4:

[0652] The server uses a generation mechanism and a highly trained generative artificial intelligence model to create new images. In this process, image generation is performed based on prepared parameters to reflect the user's emotions in terms of color tone and composition.

[0653] Step 5:

[0654] The server then performs further optimizations on the generated images. In particular, after making fine adjustments based on emotional information, it compresses the images and converts the format to reduce data volume and improve transfer efficiency.

[0655] Step 6:

[0656] The optimized image is sent from the server to the terminal. The terminal receives this data and displays the image to the user through the user interface.

[0657] Step 7:

[0658] Users can view the displayed images and add them to their materials or download them to their devices as needed. This process allows users to intuitively and quickly obtain images that match their emotional state.

[0659] (Example 2)

[0660] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0661] Conventional image generation systems have been unable to adequately consider the user's emotions, resulting in the problem that the generated images do not match the user's expectations or psychological state. Therefore, there were limitations in the quality and satisfaction level of the generated images.

[0662] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0663] In this invention, the server includes means for receiving image generation conditions and emotional information from the user via a terminal, means for analyzing the input conditions and emotional information and setting optimal parameters for generation, means for generating an image using a generation artificial intelligence model based on the parameters, and means for optimizing the generated image by applying emotional adjustments. This makes it possible to generate high-quality images that are in line with the user's emotional state.

[0664] "Conditions" refer to information that specifically indicates the attributes and characteristics that the user desires when generating an image.

[0665] "Emotional information" refers to emotional data extracted from input voice and text that reflects the user's psychological state.

[0666] A "generative artificial intelligence model" is a type of artificial intelligence structure that uses pre-trained algorithms to generate images.

[0667] "Optimal parameters" are settings adjusted to reflect conditions and emotional information in image generation, thereby improving the accuracy and quality of the generation process.

[0668] "Emotional adjustment" is a process that modifies the color tone and composition of a generated image, taking emotional information into consideration.

[0669] "Optimization" refers to processing generated images so that they meet user expectations and can be displayed and downloaded quickly and smoothly.

[0670] Embodiments for this invention are shown below.

[0671] First, the user uses the device to input the desired image criteria. This includes keywords and specific images, such as "calm landscape" or "dynamic urban scene." The device is equipped with an emotion recognition engine that analyzes the user's input through voice and text to extract the user's emotional information. From this information, emotional nuances such as "relaxed" or "lively" can be obtained.

[0672] Next, the device sends the generated dataset to the server. The server analyzes this data and sets the optimal parameters. This setting uses a pre-trained generative artificial intelligence model to generate images based on conditions and sentiment information. The generated images are then emotionally adjusted in terms of color and composition to reflect the user's emotional state.

[0673] The generated images are compressed and format-converted by the server, optimized for the user's device, and then sent. The device displays the received optimized image in its user interface, allowing the user to review the provided image. This image can be downloaded if needed.

[0674] For example, if a user inputs "a relaxing landscape," the device analyzes it based on emotional information such as "calm and peaceful." This data is sent to a server, where a generative AI model is used to generate images of a calm sea or a soft sunset. These images are then provided to the user.

[0675] An example of a prompt might be, "Generate a relaxing landscape. Sentiment analysis identifies it as 'calm and peaceful'." Based on this prompt, the system can provide an image suitable for the user.

[0676] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0677] Step 1:

[0678] The user inputs image generation conditions into the device. This input includes keywords and desired images, such as "relaxing scenery." The device receives these conditions and uses an emotion recognition engine to extract the user's emotional information from the input data. The input is text or audio data, and the output is information about the analyzed emotional state.

[0679] Step 2:

[0680] The terminal converts the input conditions and sentiment information into a structured dataset as a preprocessing step and sends it to the server. The dataset contains detailed conditions and sentiment information, which facilitates subsequent processing. The input for this step is the user's conditions and extracted sentiment information, and the output is a structured dataset for transmission to the server.

[0681] Step 3:

[0682] The server analyzes the received dataset and sets the optimal parameters for image generation. Specifically, it determines parameters such as color, style, and composition to be used based on the information. The input for this step is the dataset received from the terminal, and the output is the optimized parameters for image generation.

[0683] Step 4:

[0684] The server-based generation mechanism generates images using a generative artificial intelligence model based on optimized parameters. This model is pre-trained and capable of generating a variety of images. The input is the optimized parameters, and the output is the generated original image.

[0685] Step 5:

[0686] The server applies emotional adjustments to the generated image. These adjustments modify the image's color tone, brightness, and composition based on the user's emotional information. For example, if a calm emotion is detected, a gentle color tone will be selected. The input is the generated image and emotional information, and the output is the adjusted image.

[0687] Step 6:

[0688] The server compresses and converts the format of the adjusted image and sends the optimized data to the terminal. This process prepares the data for smooth image display and download. The input is the adjusted image, and the output is the compressed and format-converted image data.

[0689] Step 7:

[0690] The terminal displays the received image data to the user. The user can view the displayed image and, if necessary, download it or use it in other materials. The input is the image data received from the server, and the output is the image displayed on the user interface.

[0691] (Application Example 2)

[0692] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0693] Conventional advertising image generation systems were limited to generating images based solely on conditions, making it difficult to create flexible images that took into account the emotional state of the user. In particular, providing visual content that resonates with the user's emotions is crucial in the advertising field, but this has not been adequately achieved. This has resulted in the challenge of not maximizing the effectiveness of advertising.

[0694] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0695] In this invention, the server includes a receiving means for receiving conditions from the user, a generating means for analyzing the user's emotional state and generating an image using a computer model, and a providing means for presenting the generated image on the user's digital device. This makes it possible to efficiently generate and provide appropriate advertising images according to the user's emotional state.

[0696] A "user" is someone who provides the conditions for image generation through this system and receives the generated image.

[0697] "Conditions" refer to the specific requirements or desired elements that the user specifies regarding the images they generate.

[0698] "Emotional state" refers to the psychological state of a user, and is identified by analyzing information obtained from voice and text.

[0699] A "computer model" is a system that includes algorithms executed on a digital device and is used to incorporate conditions and emotional states in image generation.

[0700] "Digital devices" refer to electronic devices used to display generated images, such as smartphones, tablets, and personal computers.

[0701] "Generation method" refers to a series of processes for generating images by utilizing a computer model based on conditions and emotional states.

[0702] "Means of provision" refers to means including interfaces and digital devices for presenting the generated images to the user.

[0703] The system that realizes this invention aims to effectively provide visual advertising content by generating images that correspond to the user's emotional state and presenting them on a digital device. A specific example of this system is shown below.

[0704] The system's program runs on the user's smartphone or tablet device and provides an interface for the user to input criteria for advertising images. These criteria include information such as the type of product or atmosphere the user expects to see in the images. These criteria are then transmitted from the digital device to the server.

[0705] The server receives conditions sent by the user and uses Google Cloud Speech-to-Text or Google Cloud Natural Language API to obtain the user's emotional state from speech or text. Based on this analysis, the Stable Diffusion generative AI model is used to establish image generation parameters that are fused with the conditions.

[0706] If a user wants to generate an image that evokes a sense of peace, the emotion analysis will identify a state of "relaxation." Based on this, the server generates an advertising image with calming colors and composition.

[0707] The server then sends the generated image to the digital device, allowing the user to view it on their device. This image has been compressed, formatted, and resized as appropriate.

[0708] For example, if a user sets a "scene that feels comfortable" in their daily life, the generated ad visuals are likely to depict a relaxed beach or soft sunlight. Prompts such as "Please create an ad visual that evokes a sense of peace" are supported.

[0709] This format allows advertisers and content creators to create and distribute more engaging and effective advertisements and visual content that are tailored to the target audience's psychological state.

[0710] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0711] Step 1:

[0712] The user uses their device to input the desired conditions for the advertisement image into the interface. Specifically, they provide the product name and desired mood (e.g., calm, cheerful) in text or voice. This information is collected as input data on the device.

[0713] Step 2:

[0714] The terminal sends the conditions collected in Step 1 to the server. If there is voice input, the terminal performs preprocessing, such as converting the voice data to text, and formats it into a format that the server can easily handle before sending it. This data becomes the server input.

[0715] Step 3:

[0716] The server performs sentiment analysis on the received text data using the Google Cloud Speech-to-Text and Google Cloud Natural Language APIs. Through this analysis, the server extracts the user's emotional state in terms of "relaxed" or "excited," and outputs this as sentiment data.

[0717] Step 4:

[0718] The server combines emotion data with user-defined conditions to set optimal parameters for image generation. These settings include prompts used by the AI ​​image generation model (e.g., Stable Diffusion), such as color tone and composition.

[0719] Step 5:

[0720] The server runs a generative AI model using the configured parameters to generate an image. The generated image is a computer-generated image whose brightness and hue are adjusted according to the user's emotional state.

[0721] Step 6:

[0722] The generated images are compressed and formatted on the server and optimized for transmission to the terminal. This is to adjust the size and format so that they can be displayed more quickly on digital devices.

[0723] Step 7:

[0724] The terminal receives optimized image data from the server and displays the images through the user interface. The user can view these generated images and, if necessary, download or use them for further processing.

[0725] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0726] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0727] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0728] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0729] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0730] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0731] The inside of the Emotion Map 400 represents what's in your mind, while the outside represents what you're doing. Therefore, the further you go out the 400-coordinate scale, the more visible your emotions become (the more they manifest in your actions).

[0732] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0733] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0734] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0735] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0736] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0737] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0738] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0739] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0740] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0741] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0742] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0743] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0744] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0745] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0746] The following is further disclosed regarding the embodiments described above.

[0747] (Claim 1)

[0748] A means of receiving conditions from users,

[0749] A generation means that generates an image using a generation artificial intelligence model based on the above conditions,

[0750] A means for providing the generated image to the user,

[0751] A system that includes this.

[0752] (Claim 2)

[0753] The system according to claim 1, wherein the generation means analyzes the conditions and further shapes the context for generation.

[0754] (Claim 3)

[0755] The system according to claim 1, wherein the providing means visualizes the generated image and compresses or converts its format.

[0756] "Example 1"

[0757] (Claim 1)

[0758] A means of receiving conditions from an input means and transmitting them as digital data,

[0759] A means for analyzing based on the above conditions and generating an image using a generative artificial intelligence model,

[0760] The generated image is optimized to adjust its size and format and provided to the user's terminal,

[0761] A system that includes this.

[0762] (Claim 2)

[0763] The system according to claim 1, wherein the analysis means identifies criteria based on conditions and shapes the context for generation.

[0764] (Claim 3)

[0765] The system according to claim 1, wherein the providing means visualizes the generated image and performs compression or format conversion.

[0766] "Application Example 1"

[0767] (Claim 1)

[0768] A means of receiving conditions from users,

[0769] A generation means that generates visual data using a generation artificial intelligence algorithm based on the above conditions,

[0770] The aforementioned optimized visual data is optimized and converted into a variable form,

[0771] A means for providing the optimized visual data to the user,

[0772] A constituent element that includes this element.

[0773] (Claim 2)

[0774] The configuration according to claim 1, wherein the generating means analyzes the conditions and further shapes the context for generation.

[0775] (Claim 3)

[0776] The configuration according to claim 1, wherein the providing means displays the generated visual data and compresses or converts its format.

[0777] "Example 2 of combining an emotion engine"

[0778] (Claim 1)

[0779] A means of receiving image generation conditions and emotional information from the user via a terminal,

[0780] A means for analyzing input conditions and emotional information to set optimal parameters for generation,

[0781] A means for generating an image using a generative artificial intelligence model based on the aforementioned parameters,

[0782] A method for optimizing the generated image by applying emotional adjustments,

[0783] A means of providing optimized images to users,

[0784] A system that includes this.

[0785] (Claim 2)

[0786] The system according to claim 1, comprising an emotion recognition engine for extracting the aforementioned emotion information, wherein the receiving means performs emotion recognition.

[0787] (Claim 3)

[0788] The system according to claim 1, which compresses or converts the format of the generated image, visualizes it, and displays it to the user.

[0789] "Application example 2 when combining with an emotional engine"

[0790] (Claim 1)

[0791] A means of receiving conditions from users,

[0792] A generation means that analyzes the user's emotional state based on the aforementioned conditions and further generates an image using a computer model,

[0793] A means for presenting the generated image on the user's digital device,

[0794] A system that includes this.

[0795] (Claim 2)

[0796] The system according to claim 1, wherein the generation means analyzes the conditions and the user's emotions and further shapes the context for generation.

[0797] (Claim 3)

[0798] The system according to claim 1, wherein the providing means visualizes the generated image and compresses or converts the data format. [Explanation of symbols]

[0799] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of receiving conditions from users, A generation means that generates visual data using a generation artificial intelligence algorithm based on the above conditions, The aforementioned optimized visual data is optimized and converted into a variable form, A means for providing the optimized visual data to the user, A constituent element that includes this element.

2. The configuration according to claim 1, wherein the generation means analyzes the conditions and further shapes the context for generation.

3. The configuration according to claim 1, wherein the providing means displays the generated visual data and compresses or converts its format.