system

The system allows users to create and interact with a personalized character through image and natural language processing, enhancing conversational experiences by learning from user interactions and emotions.

JP2026105466APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

Smart Images

  • Figure 2026105466000001_ABST
    Figure 2026105466000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means of providing a terminal for users to input the appearance and personality of their ideal character, A means for generating a visual representation of a character using an AI model based on the input information, A means for displaying the generated character and activating a natural language processing engine for engaging in natural language dialogue with the user, Means for recording the dialogue between the user and the robot companion and transmitting it to an information processing device, A means for analyzing the recorded dialogue content and using a natural language processing algorithm to generate a corresponding response, Means for sending and displaying the generated response to the user, A system that includes means for saving dialogue history and improving the dialogue model through optimization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern times, simulating communication with an ideal partner is attractive to many people. However, with existing technologies, it has been difficult to generate a character that reflects the detailed wishes of users and have a natural conversation with that character. Also, conventional systems have limitations in their ability to dynamically provide conversations according to the preferences of the users themselves and have not been able to provide a realistic experience. As a result, there has been a problem that it is difficult for users to obtain a satisfactory simulation experience.

Means for Solving the Problems

[0005] This invention proposes a system that provides an interface for users to input the appearance and personality of a desired object, and generates a virtual character using an image generation algorithm based on that information. Furthermore, it utilizes an advanced natural language processing engine to enable natural dialogue between the generated character and the user. The system records the conversation and sends it to a server, allowing it to deeply understand the context of the conversation and provide a response tailored to the user. Through this approach, users can experience interacting with an ideal partner in their daily lives.

[0006] A "user" is an individual who uses the system to create their ideal character and engage in dialogue with it.

[0007] An "interface" is a window or means designed for users to input necessary information, and it serves as a link between the user and the system.

[0008] An "image generation algorithm" is a computational method for digitally generating visual character images based on feature information provided by the user.

[0009] A "character image" is a visual representation of a character generated to reflect the characteristics specified by the user.

[0010] A "natural language processing engine" is a combination of programs and algorithms used to analyze, understand, and generate natural language that humans typically use.

[0011] A "dialogue record" is digital data that saves the content of conversations exchanged between the user and the character.

[0012] A "natural language processing algorithm" is a computational method that analyzes user input and generates an appropriate linguistic response.

[0013] "Conversation history" refers to recorded data that retains and utilizes the content of past conversations between the user and the system.

[0014] A "dialogue model" is a program structure designed to simulate conversations with users and to respond dynamically to user input. [Brief explanation of the drawing]

[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13]It is a sequence diagram showing the processing flow of the data processing system in Example 2 when the emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when the emotion engine is combined.

Embodiments for Carrying Out the Invention

[0016] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0017] First, the terms used in the following description will be explained.

[0018] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include CPU (Central Processing Unit), GPU (Graphics Processing Unit), GPGPU (General-Purpose computing on Graphics Processing Units), APU (Accelerated Processing Unit), etc.

[0019] In the following embodiments, the numbered RAM (Random Access Memory) is a memory where information is temporarily stored and is used as a work memory by the processor.

[0020] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disk (e.g., hard disk), or magnetic tape, etc.

[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0023] [First Embodiment]

[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0036] To implement this invention, the overall system architecture must first be constructed. The user uses an interface via a terminal to input the appearance and personality traits of their ideal character. The terminal then transmits this information to the server.

[0037] The server activates an image generation algorithm based on the received attribute information, generating a visual representation of the character according to the specified features. This image generation is achieved using generative modeling technology, creating a realistic character that matches the settings.

[0038] The generated character is displayed on the device, and simultaneously, a natural language processing (NLP) engine is activated, enabling interaction with the user. The user can then ask the character questions or initiate a conversation. The device records the user's input in real time and sends it to the server.

[0039] The server analyzes the user's input and generates an appropriate response using a natural language processing algorithm. This algorithm also refers to past conversation history to understand the context and construct a response that aligns with the user's expectations. The generated response is then sent back to the terminal and displayed to the user, continuing this cyclical process.

[0040] Through this interaction, the server continuously records conversation logs and uses them to learn. The user-specific dialogue model improves over time, enabling conversations with a deeper understanding. For example, if a user asks, "Can you recommend a movie that will help me unwind after a long day?", the server can refer to the user's hobbies and preferences recorded in previous conversations and suggest movies of an appropriate genre.

[0041] This process allows the present invention to provide users with a natural and intimate conversational experience, enabling them to simulate an ideal partner.

[0042] The following describes the processing flow.

[0043] Step 1:

[0044] Users input information about the appearance and personality of their ideal character through the device's interface. This includes specific characteristics and choices.

[0045] Step 2:

[0046] The terminal sends the information entered by the user to the server. This data becomes the raw material used in the character generation process.

[0047] Step 3:

[0048] The server analyzes the received data and generates an image of the character using an image generation algorithm. During this process, the character's features are reflected in the depiction.

[0049] Step 4:

[0050] The device displays the generated image to the user, simultaneously activating its natural language processing engine and preparing for interaction with the character. The user can then begin a conversation with the displayed character.

[0051] Step 5:

[0052] When a user enters a question or comment for a character, that input is sent to the server in real time via the device.

[0053] Step 6:

[0054] The server uses natural language processing algorithms to analyze the received user input and generate an appropriate response. During this process, it also refers to past conversation logs to consider the context and user characteristics.

[0055] Step 7:

[0056] The generated response is sent from the server to the terminal and displayed to the user. The user can then continue the conversation based on this response.

[0057] Step 8:

[0058] The server continuously records conversation history and uses this data to improve the dialogue model. The model evolves to match the user's tendencies and preferences, providing more natural and personalized conversations.

[0059] (Example 1)

[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0061] With the advancements in information processing technology today, there is a growing demand to accurately reproduce the visual and emotional characteristics of a target desired by the user, and to build natural dialogues with that target. Therefore, the challenge lies in realizing a dialogue system that is intuitive for users to operate and provides personalized responses.

[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0063] In this invention, the server includes means for providing an information processing device for the user to input the appearance and personality of an ideal object; means for generating a visual representation of the object using a visual representation generation method based on the input information; and means for activating a language processing engine that displays the generated visual representation and enables conversation with the user. This makes it possible to provide the user with an intuitive and personalized conversational experience and to efficiently link the entire process of visual representation and conversation.

[0064] An "information processing device" is a device that has the function of providing an interface for users to input the characteristics of a target they desire.

[0065] A "visual representation generation method" is a mathematical or algorithmic technique for generating a visual representation of an object based on input information.

[0066] A "language processing engine" is a software configuration that analyzes input language data and generates appropriate responses in order to enable natural conversation with the user.

[0067] A "storage device" is a device that stores digital data and retrieves or transmits that data as needed.

[0068] A "language processing method" is a process or technique for analyzing natural language and generating an appropriate response to the input.

[0069] A "conversation model" is a digital construct that manages interactions with users and improves the accuracy and adaptability of responses through learning.

[0070] "Opinions" are responses based on user feedback and expectations, and are used as information for system learning and improvement.

[0071] One embodiment of the present invention begins with a user accessing an interface via an information processing device and inputting the appearance and characteristics of an ideal target. The user then inputs attribute information for processing in a generative AI model. This information is received by the terminal and transmitted to the server.

[0072] The server generates a visual representation of the target based on the received data using a visual representation generation method. Existing image generation software such as Stable Diffusion and DALL-E can be applied to this generation. Specifically, the server's algorithm feeds data into the generation AI model using prompt statements. An example of a prompt statement would be, "Please generate a character with blue hair and a cheerful personality."

[0073] The generated visual representation is sent from the server to the terminal and displayed on the user's screen. This allows the user to confirm the visual elements. Subsequently, the language processing engine is automatically activated on the server, enabling the user to continue a natural conversation with the generated subject.

[0074] Users input questions and comments in text format through their devices, which are sent to the server in real time. The server analyzes this user input using natural language processing methods and generates appropriate responses. The analysis utilizes natural language processing models such as OpenAI's GPT-3, taking into account past conversation history to construct responses tailored to the individual user's context.

[0075] Furthermore, the server saves the entire conversation history and uses it as training data for the language model being used. This process allows the system to improve its dialogue model over time, providing a more appropriate and sophisticated conversational experience. For example, if a user says, "I'm feeling down, can you recommend a movie that will cheer me up?", the system can use past data to suggest a movie of a suitable genre that matches the user's preferences.

[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0077] Step 1:

[0078] Users access the interface through their device and input information about the appearance and personality of their ideal partner. This input includes specific attributes such as hair color, eye shape, and personality type. The input data is structured by the device and sent to the server in JSON format.

[0079] Step 2:

[0080] The server analyzes the user input received from the terminal and initiates the image generation process using a visual representation generation method. A generation AI model is used for this process. The server generates prompt statements and inputs data into the image generation software (e.g., the image generation model) based on these statements. The resulting output is a visual representation of a character based on the user's specified attributes.

[0081] Step 3:

[0082] After the visual representation of the character is generated, the image data is sent from the server to the terminal. The terminal then displays this image on the user's screen. As a result, the user can review the visual representation and decide whether or not they are satisfied.

[0083] Step 4:

[0084] The server starts the language processing engine, and the user can begin a conversation with the generated character. The user enters questions and comments in text format through the terminal. This input is sent to the server in real time.

[0085] Step 5:

[0086] The server receives input text from the user and analyzes it using natural language processing methods. The natural language processing model used evaluates the meaning of the input text and generates an appropriate response that is relevant to the context. In doing so, it also refers to past conversation history to construct a more contextually relevant response.

[0087] Step 6:

[0088] The generated response is sent from the server to the terminal and displayed on the user's screen. This allows the user to continue the conversation with the character. After the response is displayed, the terminal records this conversation history, and the server uses this to improve the conversation model. In this way, the system learns over time, enabling more accurate conversations.

[0089] (Application Example 1)

[0090] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0091] In today's world, artificial intelligence that supports users' lives is becoming increasingly important. However, existing systems struggle to generate the ideal character that users desire and provide support in various aspects of daily life through natural dialogue. Therefore, there is a need for technology that allows users to intuitively operate an interface and realize a character that closely resembles their ideal partner.

[0092] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0093] In this invention, the server includes means for providing a terminal for the user to input the appearance and personality of their ideal character; means for generating a visual representation of the character using an AI model based on the input information; and means for displaying the generated character and activating a natural language processing engine for natural language interaction with the user. This enables the user to intuitively create their ideal character and receive support in their daily life.

[0094] A "user" is someone who inputs their ideal character via a device and enjoys interacting with it.

[0095] A "terminal" is a device used by users to set the appearance and personality of a character and to interact with them.

[0096] An "AI model" is an artificial intelligence technology used to generate a visual representation of a character based on input information.

[0097] A "character" is a personal being with a visual representation that is generated based on the user's ideals.

[0098] "Visual representation" refers to images or animations that show the appearance of the generated character.

[0099] A "natural language processing engine" is a program designed to enable natural language interaction with users.

[0100] A "dialogue record" is data that records the content of conversations between the user and the character.

[0101] An "information processing device" is a computer system that receives and analyzes dialogue records.

[0102] A "natural language processing algorithm" is a method for analyzing input language data and generating an appropriate response.

[0103] "Optimization" is the process of improving a dialogue model using saved dialogue history.

[0104] To implement this invention, the user uses a dedicated terminal to input information about the appearance and personality of their ideal character through an interface. The terminal then transmits this user input information to a server.

[0105] The server generates a visual representation of the character using an AI model based on the received information. Specifically, a generative AI model is used to create realistic character images according to the attributes specified by the user. This entire process utilizes natural language processing algorithms and image processing software.

[0106] The generated character is displayed on the terminal, and the natural language processing engine is activated. This allows the user to begin a natural and interactive conversation with the character. The server records the conversation in real time and performs detailed analysis on an information processing device.

[0107] As a concrete example, if a user asks a character, "Teach me some relaxing yoga poses," the character will consider the user's past requests and preferences to suggest yoga poses and related actions. This suggestion utilizes past conversation logs to generate the most suitable response for the user.

[0108] An example of a prompt to input into the generating AI model is: "The user has input 'Please teach me some relaxing yoga poses.' Based on the user's preferred relaxation methods from past conversation history, please suggest appropriate yoga poses." This allows users to receive support in their daily lives from an ideal character.

[0109] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0110] Step 1:

[0111] The user inputs information about the ideal character's appearance and personality into the interface via their device. The entered information is temporarily stored on the device as the character's attribute data.

[0112] Step 2:

[0113] The terminal sends temporarily stored character attribute data to the server. This data transmission allows the server to prepare for character generation based on the user's request.

[0114] Step 3:

[0115] The server receives the character attribute data as input, calls a generation AI model, and begins data processing. This model analyzes the attribute data and generates a visual representation of the character according to the specified features. The generated character image is output to the server.

[0116] Step 4:

[0117] The server sends the generated character image to the terminal. The terminal receives it and displays the image on its screen. The user can then see the visualized character.

[0118] Step 5:

[0119] The server starts up its natural language processing engine and prepares to interact with the user. When the terminal receives text input from the user, that text data is sent to the server.

[0120] Step 6:

[0121] The server analyzes the received text data using a natural language processing algorithm. It also refers to past dialogue history to understand the context. Based on the analysis, it generates an appropriate response and outputs it in text format.

[0122] Step 7:

[0123] The server sends the generated response to the terminal. The terminal receives this response and displays it back to the user. The user can then review the character's response and continue the conversation.

[0124] Step 8:

[0125] The server continuously records conversation logs and stores the conversation history in an information processing device. This creates a data foundation that allows for a deeper understanding of user preferences and context in future conversations.

[0126] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0127] This invention is a system that allows users to generate their ideal character and engage in natural and intimate conversations with that character. Furthermore, it incorporates an emotion engine that understands the user's emotions and generates corresponding responses, thereby enhancing engagement.

[0128] To implement this system, the terminal first provides an interface to the user, where the user inputs detailed information about the character's appearance and personality. The input information is immediately sent from the terminal to the server, which uses an image generation algorithm to generate a character image. The generated character is then displayed to the user on the terminal.

[0129] Next, a conversation with the user is initiated using a natural language processing engine, but this is where the emotion engine comes in. The emotion engine analyzes the user's input text and data from voice tone and keystrokes during the conversation to estimate the user's emotional state. Based on this emotional state, the server generates and provides the optimal response to the user, thus designing the conversation to have greater depth.

[0130] For example, if a user tells a character, "I'm very tired today," the emotion engine will determine that the user is tired based on words like "tired" and "very." Based on this information, the server will generate a gentle response such as, "Take it easy today and do something you enjoy," and convey it to the user via the device. In this way, a more personalized experience is realized through dialogue that is tailored to the user's emotions.

[0131] The server saves dialogue content and emotional data as conversation history, and uses this to improve the dialogue model. This approach allows the system to provide more user-friendly dialogue the more it is used, continuously offering users a simulation experience with the most suitable partner.

[0132] The following describes the processing flow.

[0133] Step 1:

[0134] Users use the device's interface to input details about the ideal character's appearance and personality. This information includes specific looks and personality traits.

[0135] Step 2:

[0136] The terminal sends the information entered by the user to the server. The server receives this information, activates an image generation algorithm, and generates an image of a character based on the specified features.

[0137] Step 3:

[0138] The generated character image is sent to the device and displayed on the user's screen. The device simultaneously activates its natural language processing engine, preparing to interact with the user.

[0139] Step 4:

[0140] Users send messages to characters to initiate conversations. This input is processed in real time and sent from the terminal to the server.

[0141] Step 5:

[0142] The server receives user input and performs sentiment analysis using an emotion engine. For example, it analyzes keywords in the context and input methods (speed, frequency) to recognize the user's emotions.

[0143] Step 6:

[0144] The server uses emotion data recognized by the emotion engine and natural language processing algorithms to generate appropriate responses that match the user's emotions. In doing so, it also refers to conversation history and past emotion logs to construct a dialogue optimized for the user.

[0145] Step 7:

[0146] The generated response is sent to the device and displayed to the user. The user can then use this response to continue the conversation.

[0147] Step 8:

[0148] The server records all dialogue and emotional data, continuously improving the dialogue model. This allows for more relatable and emotionally nuanced dialogues for the user over time.

[0149] (Example 2)

[0150] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0151] Current conversational systems have limited ability to fully understand individual user emotions and characteristics and generate personalized responses. This results in a uniform user experience and a failure to provide sufficiently personalized dialogue. Furthermore, they are inadequate in generating responses that reflect the user's emotional state, lacking depth and intimacy in conversations.

[0152] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0153] In this invention, the server includes means for providing an interface for the user to input the appearance and characteristics of an ideal object; means for generating an image of the object using image generation technology based on the input information; and means for using emotion analysis technology to analyze the user's input, voice tone, and operation data to estimate their emotional state. This enables personalized, emotion-sensitive dialogue, resulting in a rich and intimate experience for the user.

[0154] An "interface" is a means for a user to interact with a system and information in a two-way manner, and is part of the user experience that enables visual or physical interaction.

[0155] "Image generation technology" refers to algorithms and processes that create visual information based on input data, and is a technology for producing computer-generated images.

[0156] "Language processing technology" refers to techniques for understanding, generating, and analyzing human language on a computer, and is a method that enables advanced dialogue using natural language.

[0157] "Emotion analysis technology" is an analytical method for estimating a user's emotional state from their input data, and it is a technology that quantifies or categorizes the user's emotions and reflects them in the response.

[0158] A "dialogue model" is a collection of data structures and algorithms for managing interactions with users and generating optimal responses; it is the foundation for controlling the flow and content of dialogue.

[0159] "Feedback" refers to information about user responses and evaluations of the system, and is part of the information used to improve the system and enhance its accuracy.

[0160] This invention is a system that generates a character desired by the user and allows for intimate interaction with that character. Specifically, this system is implemented according to the following procedure.

[0161] The terminal provides the user with an interface for character creation. This interface allows the user to input detailed information about the character's appearance and attributes. The terminal sends this data to a server. The server uses image generation technology based on the input information to generate an image of the character. The image generation technology used here typically involves generative AI models such as Stable Diffusion or DALL-E.

[0162] The server sends the generated character image to the terminal, which then displays the image to the user. The user can then review the generated character and make further adjustments through the interface if necessary.

[0163] Furthermore, the device utilizes natural language processing technology to initiate a dialogue between the character and the user. During this process, the server analyzes the user's input text, voice tone, and interaction data through sentiment analysis technology. This allows the server to estimate the user's emotional state and provide a personalized response based on that estimate.

[0164] For example, when a user tells a character, "I'm very tired today," the server's emotion analysis technology determines that the user is tired based on words like "tired." Based on this information, the server generates a gentle response such as, "Take it easy today and do something you enjoy," and conveys it to the user via the device. This provides the user with a more travel-like and relatable dialogue.

[0165] An example of a prompt is, "The user has told the character that they are very tired. How would you respond?" This is used in the response generation process to understand and reflect the user's intent.

[0166] This system is implemented using a combination of computer hardware and associated software. Specific hardware requirements include a terminal with a standard internet connection, display, and input devices. This enables efficient character generation using generative AI models and allows for deep, natural dialogue.

[0167] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0168] Step 1:

[0169] The device provides the user with a character creation interface. The user uses this interface to input information such as the character's appearance and personality. This input data is divided into multiple fields, allowing for detailed specification using sliders and dropdown menus. The user's input information is formatted as JSON data.

[0170] Step 2:

[0171] The terminal sends the information entered by the user to the server. The server analyzes this JSON-formatted input data to extract the parameters necessary for the image generation algorithm. The input includes the character's appearance elements and characteristics. Using this data, the server starts the generation AI model.

[0172] Step 3:

[0173] The server generates character images using image generation technology. Generative AI models such as Stable Diffusion and DALL-E are used here. This process involves detailed data processing and calculations based on input parameters to synthesize an image that closely matches the specified result. The output is the generated character image data.

[0174] Step 4:

[0175] The server sends the generated character image data to the terminal. The terminal displays the received image data to the user. The user can review this image and make additional modifications as needed. At this stage, it is possible to view the image in high resolution using an image viewer.

[0176] Step 5:

[0177] The terminal uses natural language processing to initiate a dialogue between the user and the character. Text input from the user is received through the interface, triggering the dialogue. The entered text data is then sent to the server.

[0178] Step 6:

[0179] The server activates its sentiment analysis engine based on the received text data to estimate the user's emotional state. The server analyzes the sentiment indicators within the text and generates numerical or categorized sentiment data. This data is used as a reference for generating responses.

[0180] Step 7:

[0181] The server generates responses that reflect the user's emotional state. Using existing language processing models, it generates appropriate replies based on the user's emotions and a generative AI model. This response data is refined to aim for natural and personalized dialogue.

[0182] Step 8:

[0183] The generated response is sent from the server to the terminal. The terminal displays the response to the user and prompts for further interaction. The interaction history and sentiment data are stored on the server and used later to improve the system.

[0184] (Application Example 2)

[0185] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0186] Conventional character generation systems have suffered from a decline in the quality of dialogue because they do not take into account the emotional elements in user interactions. Furthermore, their inability to immediately respond to changes in the user's emotional state has limited their potential for use in family communication and entertainment. This invention aims to solve the technical challenges of understanding user emotions and realizing more intimate and personal communication.

[0187] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0188] This invention includes a server that uses an emotion analysis engine to analyze data on the tone of the user's voice and input actions to estimate their emotional state; a server that uses a real-time generated character assistant to provide added value for deepening home interactions; and a server that stores conversation history and improves the dialogue model through learning. This makes it possible to make the interaction between the user and the character assistant more natural and intimate, and to enhance in-home entertainment and communication.

[0189] An "interface" is a means for a user to exchange information with a system, and it is possible to input information such as appearance and personality.

[0190] An "image generation algorithm" is a computational method for automatically generating visual characters based on input information.

[0191] A "natural language processing engine" is software that has the ability to understand user input and generate appropriate responses.

[0192] A "sentiment analysis engine" is a program that determines the user's emotional state based on their tone of voice and input actions.

[0193] A "character assistant" is a virtual entity that functions as an aid in conversations and communicates with the user.

[0194] A "household robot" is a mechanical device that can perform various tasks in a home environment and interact with the inhabitants.

[0195] "Conversation history" refers to data that the system has recorded and saved, containing all past interactions with the user.

[0196] A "dialogue model" is a collection of algorithms used to design the flow of interactions with users and provide optimal communication.

[0197] To implement this invention, the user must first input details of the desired character's appearance and personality using a dedicated interface. The terminal receives this information and sends it to the server. The server uses an image generation algorithm to generate a visual representation of the character based on the input information. Image generation libraries such as Stable Diffusion and DALL-E can be utilized in this process.

[0198] After the character is generated, it is displayed to the user on the device, and at this time, the natural language processing engine is activated. This engine receives text and voice input from the user, analyzes the content, and generates an appropriate response. For natural language processing, libraries such as NLTK and spaCy can be used, and IBM Watson® Tone Analyzer can be used as the emotion engine. This makes it possible to estimate the user's emotional state based on voice tone and input characteristics.

[0199] Furthermore, the character assistant operates as a home robot, providing enjoyment through interaction with the user. This robot uses hardware such as smart speakers, microphones, and speakers to engage in voice interaction, recording the user's responses for the next interaction and sending them to a server. The server improves the dialogue model based on the accumulated conversation history, providing more personalized responses to the user.

[0200] For example, if a child tells a home robot, "I couldn't play with my friends today," the system will determine from the child's words and tone of voice that the child is feeling a little lonely. It will then suggest something like, "How about we do some drawing together today to refresh ourselves?" This allows the child to find something to enjoy without feeling lonely. An example of a prompt to the generative AI model in this case might be, "The user may be feeling lonely. What would be an appropriate response?"

[0201] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0202] Step 1:

[0203] The user inputs the appearance and personality of their ideal character through the interface. The entered data is stored on the device as detailed information about the character's appearance and personality. The device then sends this input information to the server.

[0204] Step 2:

[0205] The server generates a visual representation of the character using an image generation algorithm (e.g., Stable Diffusion or DALL-E) based on the received information. The input is the character features specified by the user, and the output is the generated character image. Here, data processing involves calculations to convert the input text information into an image.

[0206] Step 3:

[0207] The terminal receives the generated character image sent from the server and displays it to the user. The user can then begin a visual interaction with the generated character on the terminal. Visual feedback is provided to the user during this step.

[0208] Step 4:

[0209] The user initiates a conversation with the character using text or voice. The terminal collects this input data and sends it to a natural language processing engine. The input is the user's voice or text data, and the output is data converted into natural language.

[0210] Step 5:

[0211] The server analyzes the received language data using a natural language processing engine. In conjunction with this, an emotion analysis engine is also used to estimate the user's emotional state. Specifically, voice tone and keystroke patterns are input, and data calculations are performed to output the user's emotion (e.g., joy, sadness).

[0212] Step 6:

[0213] Based on the analysis results, the server generates a corresponding response using an AI model. Here, the response content is determined using a prompt. The input is the analyzed emotion and dialogue content, and the output is an appropriate response to the user.

[0214] Step 7:

[0215] The generated response is sent to the user's device and provided to the user through the device by being displayed or spoken aloud. This allows the user to continue interacting with the character.

[0216] Step 8:

[0217] The terminal sends the entire conversation history with the user to the server, which uses this data to improve the conversation model. The input is the past conversation history, and by analyzing this data, the output is an improvement in the overall conversation quality of the system.

[0218] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0219] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0220] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0221] [Second Embodiment]

[0222] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0223] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0224] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0225] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0226] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0227] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0228] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0229] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0230] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0231] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0232] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0233] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0234] To implement this invention, the overall system architecture must first be constructed. The user uses an interface via a terminal to input the appearance and personality traits of their ideal character. The terminal then transmits this information to the server.

[0235] The server activates an image generation algorithm based on the received attribute information, generating a visual representation of the character according to the specified features. This image generation is achieved using generative modeling technology, creating a realistic character that matches the settings.

[0236] The generated character is displayed on the device, and simultaneously, a natural language processing (NLP) engine is activated, enabling interaction with the user. The user can then ask the character questions or initiate a conversation. The device records the user's input in real time and sends it to the server.

[0237] The server analyzes the user's input and generates an appropriate response using a natural language processing algorithm. This algorithm also refers to past conversation history to understand the context and construct a response that aligns with the user's expectations. The generated response is then sent back to the terminal and displayed to the user, continuing this cyclical process.

[0238] Through this interaction, the server continuously records conversation logs and uses them to learn. The user-specific dialogue model improves over time, enabling conversations with a deeper understanding. For example, if a user asks, "Can you recommend a movie that will help me unwind after a long day?", the server can refer to the user's hobbies and preferences recorded in previous conversations and suggest movies of an appropriate genre.

[0239] This process allows the present invention to provide users with a natural and intimate conversational experience, enabling them to simulate an ideal partner.

[0240] The following describes the processing flow.

[0241] Step 1:

[0242] Users input information about the appearance and personality of their ideal character through the device's interface. This includes specific characteristics and choices.

[0243] Step 2:

[0244] The terminal sends the information entered by the user to the server. This data becomes the raw material used in the character generation process.

[0245] Step 3:

[0246] The server analyzes the received data and generates an image of the character using an image generation algorithm. During this process, the character's features are reflected in the depiction.

[0247] Step 4:

[0248] The device displays the generated image to the user, simultaneously activating its natural language processing engine and preparing for interaction with the character. The user can then begin a conversation with the displayed character.

[0249] Step 5:

[0250] When a user enters a question or comment for a character, that input is sent to the server in real time via the device.

[0251] Step 6:

[0252] The server uses natural language processing algorithms to analyze the received user input and generate an appropriate response. During this process, it also refers to past conversation logs to consider the context and user characteristics.

[0253] Step 7:

[0254] The generated response is sent from the server to the terminal and displayed to the user. The user can then continue the conversation based on this response.

[0255] Step 8:

[0256] The server continuously records conversation history and uses this data to improve the dialogue model. The model evolves to match the user's tendencies and preferences, providing more natural and personalized conversations.

[0257] (Example 1)

[0258] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0259] With the advancements in information processing technology today, there is a growing demand to accurately reproduce the visual and emotional characteristics of a target desired by the user, and to build natural dialogues with that target. Therefore, the challenge lies in realizing a dialogue system that is intuitive for users to operate and provides personalized responses.

[0260] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0261] In this invention, the server includes means for providing an information processing device for the user to input the appearance and personality of an ideal object; means for generating a visual representation of the object using a visual representation generation method based on the input information; and means for activating a language processing engine that displays the generated visual representation and enables conversation with the user. This makes it possible to provide the user with an intuitive and personalized conversational experience and to efficiently link the entire process of visual representation and conversation.

[0262] An "information processing device" is a device that has the function of providing an interface for users to input the characteristics of a target they desire.

[0263] A "visual representation generation method" is a mathematical or algorithmic technique for generating a visual representation of an object based on input information.

[0264] A "language processing engine" is a software configuration that analyzes input language data and generates appropriate responses in order to enable natural conversation with the user.

[0265] A "storage device" is a device that stores digital data and retrieves or transmits that data as needed.

[0266] A "language processing method" is a process or technique for analyzing natural language and generating an appropriate response to the input.

[0267] A "conversation model" is a digital construct that manages interactions with users and improves the accuracy and adaptability of responses through learning.

[0268] "Opinions" are responses based on user feedback and expectations, and are used as information for system learning and improvement.

[0269] One embodiment of the present invention begins with a user accessing an interface via an information processing device and inputting the appearance and characteristics of an ideal target. The user then inputs attribute information for processing in a generative AI model. This information is received by the terminal and transmitted to the server.

[0270] The server generates a visual representation of the target based on the received data using a visual representation generation method. Existing image generation software such as Stable Diffusion and DALL-E can be applied to this generation. Specifically, the server's algorithm feeds data into the generation AI model using prompt statements. An example of a prompt statement would be, "Please generate a character with blue hair and a cheerful personality."

[0271] The generated visual representation is sent from the server to the terminal and displayed on the user's screen. This allows the user to confirm the visual elements. Subsequently, the language processing engine is automatically activated on the server, enabling the user to continue a natural conversation with the generated subject.

[0272] Users input questions and comments in text format through their devices, which are sent to the server in real time. The server analyzes this user input using natural language processing methods and generates appropriate responses. The analysis utilizes natural language processing models such as OpenAI's GPT-3, taking into account past conversation history to construct responses tailored to the individual user's context.

[0273] Furthermore, the server saves the entire conversation history and uses it as training data for the language model being used. This process allows the system to improve its dialogue model over time, providing a more appropriate and sophisticated conversational experience. For example, if a user says, "I'm feeling down, can you recommend a movie that will cheer me up?", the system can use past data to suggest a movie of a suitable genre that matches the user's preferences.

[0274] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0275] Step 1:

[0276] Users access the interface through their device and input information about the appearance and personality of their ideal partner. This input includes specific attributes such as hair color, eye shape, and personality type. The input data is structured by the device and sent to the server in JSON format.

[0277] Step 2:

[0278] The server analyzes the user input received from the terminal and initiates the image generation process using a visual representation generation method. A generation AI model is used for this process. The server generates prompt statements and inputs data into the image generation software (e.g., the image generation model) based on these statements. The resulting output is a visual representation of a character based on the user's specified attributes.

[0279] Step 3:

[0280] After the visual representation of the character is generated, the image data is sent from the server to the terminal. The terminal displays this image on the user's screen. As a result, the user can view the visual representation and determine whether they are satisfied.

[0281] Step 4:

[0282] The server activates the language processing engine, enabling the user to start a conversation with the generated character. The user inputs questions and comments in text form through the terminal. This input is sent to the server in real time.

[0283] Step 5:

[0284] The server obtains the input text from the user and analyzes it using a language processing method. The natural language processing model used here evaluates the meaning of the input text and generates an appropriate response according to the context. At this time, the past conversation history is also referred to to construct a more context-appropriate response.

[0285] Step 6:

[0286] The generated response is sent from the server to the terminal and displayed on the user's screen. The user can thereby continue the conversation with the character. After the response is displayed, the terminal records this conversation history, and the server improves the conversation model based on it. As a result, the system learns over time and enables more accurate conversations.

[0287] (Application Example 1)

[0288] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0289] In today's world, artificial intelligence that supports users' lives is becoming increasingly important. However, existing systems struggle to generate the ideal character that users desire and provide support in various aspects of daily life through natural dialogue. Therefore, there is a need for technology that allows users to intuitively operate an interface and realize a character that closely resembles their ideal partner.

[0290] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0291] In this invention, the server includes means for providing a terminal for the user to input the appearance and personality of their ideal character; means for generating a visual representation of the character using an AI model based on the input information; and means for displaying the generated character and activating a natural language processing engine for natural language interaction with the user. This enables the user to intuitively create their ideal character and receive support in their daily life.

[0292] A "user" is someone who inputs their ideal character via a device and enjoys interacting with it.

[0293] A "terminal" is a device used by users to set the appearance and personality of a character and to interact with them.

[0294] An "AI model" is an artificial intelligence technology used to generate a visual representation of a character based on input information.

[0295] A "character" is a personal being with a visual representation that is generated based on the user's ideals.

[0296] "Visual representation" refers to images or animations that show the appearance of the generated character.

[0297] A "natural language processing engine" is a program designed to enable natural language interaction with users.

[0298] A "dialogue record" is data that records the content of conversations between the user and the character.

[0299] An "information processing device" is a computer system that receives and analyzes dialogue records.

[0300] A "natural language processing algorithm" is a method for analyzing input language data and generating an appropriate response.

[0301] "Optimization" is the process of improving a dialogue model using saved dialogue history.

[0302] To implement this invention, the user uses a dedicated terminal to input information about the appearance and personality of their ideal character through an interface. The terminal then transmits this user input information to a server.

[0303] The server generates a visual representation of the character using an AI model based on the received information. Specifically, a generative AI model is used to create realistic character images according to the attributes specified by the user. This entire process utilizes natural language processing algorithms and image processing software.

[0304] The generated character is displayed on the terminal, and the natural language processing engine is activated. This allows the user to begin a natural and interactive conversation with the character. The server records the conversation in real time and performs detailed analysis on an information processing device.

[0305] As a specific example, when a user says to the character, "Please teach me relaxing yoga poses," the character proposes actions related to yoga poses considering the user's past requests and preferences. In this proposal, the past dialogue logs are utilized to generate an optimal response for the user.

[0306] As an example of the prompt text input to the generation AI model, "There was an input from the user saying 'Please teach me relaxing yoga poses'. Please propose appropriate yoga poses referring to the relaxation methods preferred by the user based on the past conversation history." can be cited. Thus, the user can receive support in daily life with an ideal character.

[0307] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0308] Step 1:

[0309] The user inputs, via the terminal, the appearance and personality information of the ideal character into the interface. The input information is temporarily stored in the terminal as the character's attribute data.

[0310] Step 2:

[0311] The terminal transmits the temporarily stored character attribute data to the server. By this data transmission, the server prepares for character generation based on the user's request.

[0312] Step 3:

[0313] The server calls the generation AI model with the received character attribute data as input and starts data processing. This model analyzes the attribute data and generates a visual representation of the character according to the specified features. The generated character image is output to the server.

[0314] Step 4:

[0315] The server sends the generated character image to the terminal. The terminal receives it and displays the image on its screen. The user can then see the visualized character.

[0316] Step 5:

[0317] The server starts up its natural language processing engine and prepares to interact with the user. When the terminal receives text input from the user, that text data is sent to the server.

[0318] Step 6:

[0319] The server analyzes the received text data using a natural language processing algorithm. It also refers to past dialogue history to understand the context. Based on the analysis, it generates an appropriate response and outputs it in text format.

[0320] Step 7:

[0321] The server sends the generated response to the terminal. The terminal receives this response and displays it back to the user. The user can then review the character's response and continue the conversation.

[0322] Step 8:

[0323] The server continuously records conversation logs and stores the conversation history in an information processing device. This creates a data foundation that allows for a deeper understanding of user preferences and context in future conversations.

[0324] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0325] This invention is a system that allows users to generate their ideal character and engage in natural and intimate conversations with that character. Furthermore, it incorporates an emotion engine that understands the user's emotions and generates corresponding responses, thereby enhancing engagement.

[0326] To implement this system, the terminal first provides an interface to the user, where the user inputs detailed information about the character's appearance and personality. The input information is immediately sent from the terminal to the server, which uses an image generation algorithm to generate a character image. The generated character is then displayed to the user on the terminal.

[0327] Next, a conversation with the user is initiated using a natural language processing engine, but this is where the emotion engine comes in. The emotion engine analyzes the user's input text and data from voice tone and keystrokes during the conversation to estimate the user's emotional state. Based on this emotional state, the server generates and provides the optimal response to the user, thus designing the conversation to have greater depth.

[0328] For example, if a user tells a character, "I'm very tired today," the emotion engine will determine that the user is tired based on words like "tired" and "very." Based on this information, the server will generate a gentle response such as, "Take it easy today and do something you enjoy," and convey it to the user via the device. In this way, a more personalized experience is realized through dialogue that is tailored to the user's emotions.

[0329] The server saves dialogue content and emotional data as conversation history, and uses this to improve the dialogue model. This approach allows the system to provide more user-friendly dialogue the more it is used, continuously offering users a simulation experience with the most suitable partner.

[0330] The following describes the processing flow.

[0331] Step 1:

[0332] Users use the device's interface to input details about the ideal character's appearance and personality. This information includes specific looks and personality traits.

[0333] Step 2:

[0334] The terminal sends the information entered by the user to the server. The server receives this information, activates an image generation algorithm, and generates an image of a character based on the specified features.

[0335] Step 3:

[0336] The generated character image is sent to the device and displayed on the user's screen. The device simultaneously activates its natural language processing engine, preparing to interact with the user.

[0337] Step 4:

[0338] Users send messages to characters to initiate conversations. This input is processed in real time and sent from the terminal to the server.

[0339] Step 5:

[0340] The server receives user input and performs sentiment analysis using an emotion engine. For example, it analyzes keywords in the context and input methods (speed, frequency) to recognize the user's emotions.

[0341] Step 6:

[0342] The server uses emotion data recognized by the emotion engine and natural language processing algorithms to generate appropriate responses that match the user's emotions. In doing so, it also refers to conversation history and past emotion logs to construct a dialogue optimized for the user.

[0343] Step 7:

[0344] The generated response is sent to the device and displayed to the user. The user can then use this response to continue the conversation.

[0345] Step 8:

[0346] The server records all dialogue and emotional data, continuously improving the dialogue model. This allows for more relatable and emotionally nuanced dialogues for the user over time.

[0347] (Example 2)

[0348] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0349] Current conversational systems have limited ability to fully understand individual user emotions and characteristics and generate personalized responses. This results in a uniform user experience and a failure to provide sufficiently personalized dialogue. Furthermore, they are inadequate in generating responses that reflect the user's emotional state, lacking depth and intimacy in conversations.

[0350] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0351] In this invention, the server includes means for providing an interface for the user to input the appearance and characteristics of an ideal object; means for generating an image of the object using image generation technology based on the input information; and means for using emotion analysis technology to analyze the user's input, voice tone, and operation data to estimate their emotional state. This enables personalized, emotion-sensitive dialogue, resulting in a rich and intimate experience for the user.

[0352] An "interface" is a means for a user to interact with a system and information in a two-way manner, and is part of the user experience that enables visual or physical interaction.

[0353] "Image generation technology" refers to algorithms and processes that create visual information based on input data, and is a technology for producing computer-generated images.

[0354] "Language processing technology" refers to techniques for understanding, generating, and analyzing human language on a computer, and is a method that enables advanced dialogue using natural language.

[0355] "Emotion analysis technology" is an analytical method for estimating a user's emotional state from their input data, and it is a technology that quantifies or categorizes the user's emotions and reflects them in the response.

[0356] A "dialogue model" is a collection of data structures and algorithms for managing interactions with users and generating optimal responses; it is the foundation for controlling the flow and content of dialogue.

[0357] "Feedback" refers to information about user responses and evaluations of the system, and is part of the information used to improve the system and enhance its accuracy.

[0358] This invention is a system that generates a character desired by the user and allows for intimate interaction with that character. Specifically, this system is implemented according to the following procedure.

[0359] The terminal provides the user with an interface for character creation. This interface allows the user to input detailed information about the character's appearance and attributes. The terminal sends this data to a server. The server uses image generation technology based on the input information to generate an image of the character. The image generation technology used here typically involves generative AI models such as Stable Diffusion or DALL-E.

[0360] The server sends the generated character image to the terminal, which then displays the image to the user. The user can then review the generated character and make further adjustments through the interface if necessary.

[0361] Furthermore, the device utilizes natural language processing technology to initiate a dialogue between the character and the user. During this process, the server analyzes the user's input text, voice tone, and interaction data through sentiment analysis technology. This allows the server to estimate the user's emotional state and provide a personalized response based on that estimate.

[0362] For example, when a user tells a character, "I'm very tired today," the server's emotion analysis technology determines that the user is tired based on words like "tired." Based on this information, the server generates a gentle response such as, "Take it easy today and do something you enjoy," and conveys it to the user via the device. This provides the user with a more travel-like and relatable dialogue.

[0363] An example of a prompt is, "The user has told the character that they are very tired. How would you respond?" This is used in the response generation process to understand and reflect the user's intent.

[0364] This system is implemented using a combination of computer hardware and associated software. Specific hardware requirements include a terminal with a standard internet connection, display, and input devices. This enables efficient character generation using generative AI models and allows for deep, natural dialogue.

[0365] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0366] Step 1:

[0367] The device provides the user with a character creation interface. The user uses this interface to input information such as the character's appearance and personality. This input data is divided into multiple fields, allowing for detailed specification using sliders and dropdown menus. The user's input information is formatted as JSON data.

[0368] Step 2:

[0369] The terminal sends the information entered by the user to the server. The server analyzes this JSON-formatted input data to extract the parameters necessary for the image generation algorithm. The input includes the character's appearance elements and characteristics. Using this data, the server starts the generation AI model.

[0370] Step 3:

[0371] The server generates character images using image generation technology. Generative AI models such as Stable Diffusion and DALL-E are used here. This process involves detailed data processing and calculations based on input parameters to synthesize an image that closely matches the specified result. The output is the generated character image data.

[0372] Step 4:

[0373] The server sends the generated character image data to the terminal. The terminal displays the received image data to the user. The user can review this image and make additional modifications as needed. At this stage, it is possible to view the image in high resolution using an image viewer.

[0374] Step 5:

[0375] The terminal uses natural language processing to initiate a dialogue between the user and the character. Text input from the user is received through the interface, triggering the dialogue. The entered text data is then sent to the server.

[0376] Step 6:

[0377] The server activates its sentiment analysis engine based on the received text data to estimate the user's emotional state. The server analyzes the sentiment indicators within the text and generates numerical or categorized sentiment data. This data is used as a reference for generating responses.

[0378] Step 7:

[0379] The server generates responses that reflect the user's emotional state. Using existing language processing models, it generates appropriate replies based on the user's emotions and a generative AI model. This response data is refined to aim for natural and personalized dialogue.

[0380] Step 8:

[0381] The generated response is sent from the server to the terminal. The terminal displays the response to the user and prompts for further interaction. The interaction history and sentiment data are stored on the server and used later to improve the system.

[0382] (Application Example 2)

[0383] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0384] Conventional character generation systems have suffered from a decline in the quality of dialogue because they do not take into account the emotional elements in user interactions. Furthermore, their inability to immediately respond to changes in the user's emotional state has limited their potential for use in family communication and entertainment. This invention aims to solve the technical challenges of understanding user emotions and realizing more intimate and personal communication.

[0385] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0386] This invention includes a server that uses an emotion analysis engine to analyze data on the tone of the user's voice and input actions to estimate their emotional state; a server that uses a real-time generated character assistant to provide added value for deepening home interactions; and a server that stores conversation history and improves the dialogue model through learning. This makes it possible to make the interaction between the user and the character assistant more natural and intimate, and to enhance in-home entertainment and communication.

[0387] An "interface" is a means for a user to exchange information with a system, and it is possible to input information such as appearance and personality.

[0388] An "image generation algorithm" is a computational method for automatically generating visual characters based on input information.

[0389] A "natural language processing engine" is software that has the ability to understand user input and generate appropriate responses.

[0390] A "sentiment analysis engine" is a program that determines the user's emotional state based on their tone of voice and input actions.

[0391] A "character assistant" is a virtual entity that functions as an aid in conversations and communicates with the user.

[0392] A "household robot" is a mechanical device that can perform various tasks in a home environment and interact with the inhabitants.

[0393] "Conversation history" refers to data that the system has recorded and saved, containing all past interactions with the user.

[0394] A "dialogue model" is a collection of algorithms used to design the flow of interactions with users and provide optimal communication.

[0395] To implement this invention, the user must first input details of the desired character's appearance and personality using a dedicated interface. The terminal receives this information and sends it to the server. The server uses an image generation algorithm to generate a visual representation of the character based on the input information. Image generation libraries such as Stable Diffusion and DALL-E can be utilized in this process.

[0396] After the character is generated, it is displayed to the user on the device, and at this time, the natural language processing engine is activated. This engine receives text and voice input from the user, analyzes the content, and generates an appropriate response. For natural language processing, libraries such as NLTK and spaCy can be used, and the IBM Watson Tone Analyzer can be used as the emotion engine. This makes it possible to estimate the user's emotional state based on voice tone and input characteristics.

[0397] Furthermore, the character assistant operates as a home robot, providing enjoyment through interaction with the user. This robot uses hardware such as smart speakers, microphones, and speakers to engage in voice interaction, recording the user's responses for the next interaction and sending them to a server. The server improves the dialogue model based on the accumulated conversation history, providing more personalized responses to the user.

[0398] For example, if a child tells a home robot, "I couldn't play with my friends today," the system will determine from the child's words and tone of voice that the child is feeling a little lonely. It will then suggest something like, "How about we do some drawing together today to refresh ourselves?" This allows the child to find something to enjoy without feeling lonely. An example of a prompt to the generative AI model in this case might be, "The user may be feeling lonely. What would be an appropriate response?"

[0399] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0400] Step 1:

[0401] The user inputs the appearance and personality of their ideal character through the interface. The entered data is stored on the device as detailed information about the character's appearance and personality. The device then sends this input information to the server.

[0402] Step 2:

[0403] The server generates a visual representation of the character using an image generation algorithm (e.g., Stable Diffusion or DALL-E) based on the received information. The input is the character features specified by the user, and the output is the generated character image. Here, data processing involves calculations to convert the input text information into an image.

[0404] Step 3:

[0405] The terminal receives the generated character image sent from the server and displays it to the user. The user can then begin a visual interaction with the generated character on the terminal. Visual feedback is provided to the user during this step.

[0406] Step 4:

[0407] The user initiates a conversation with the character using text or voice. The terminal collects this input data and sends it to a natural language processing engine. The input is the user's voice or text data, and the output is data converted into natural language.

[0408] Step 5:

[0409] The server analyzes the received language data using a natural language processing engine. In conjunction with this, an emotion analysis engine is also used to estimate the user's emotional state. Specifically, voice tone and keystroke patterns are input, and data calculations are performed to output the user's emotion (e.g., joy, sadness).

[0410] Step 6:

[0411] Based on the analysis results, the server generates a corresponding response using an AI model. Here, the response content is determined using a prompt. The input is the analyzed emotion and dialogue content, and the output is an appropriate response to the user.

[0412] Step 7:

[0413] The generated response is sent to the user's device and provided to the user through the device by being displayed or spoken aloud. This allows the user to continue interacting with the character.

[0414] Step 8:

[0415] The terminal sends the entire conversation history with the user to the server, which uses this data to improve the conversation model. The input is the past conversation history, and by analyzing this data, the output is an improvement in the overall conversation quality of the system.

[0416] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0417] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0418] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0419] [Third Embodiment]

[0420] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0421] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0422] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0423] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0424] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0425] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0426] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0427] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0428] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0429] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0430] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0431] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0432] To implement this invention, the overall system architecture must first be constructed. The user uses an interface via a terminal to input the appearance and personality traits of their ideal character. The terminal then transmits this information to the server.

[0433] The server activates an image generation algorithm based on the received attribute information, generating a visual representation of the character according to the specified features. This image generation is achieved using generative modeling technology, creating a realistic character that matches the settings.

[0434] The generated character is displayed on the device, and simultaneously, a natural language processing (NLP) engine is activated, enabling interaction with the user. The user can then ask the character questions or initiate a conversation. The device records the user's input in real time and sends it to the server.

[0435] The server analyzes the user's input and generates an appropriate response using a natural language processing algorithm. This algorithm also refers to past conversation history to understand the context and construct a response that aligns with the user's expectations. The generated response is then sent back to the terminal and displayed to the user, continuing this cyclical process.

[0436] Through this interaction, the server continuously records conversation logs and uses them to learn. The user-specific dialogue model improves over time, enabling conversations with a deeper understanding. For example, if a user asks, "Can you recommend a movie that will help me unwind after a long day?", the server can refer to the user's hobbies and preferences recorded in previous conversations and suggest movies of an appropriate genre.

[0437] This process allows the present invention to provide users with a natural and intimate conversational experience, enabling them to simulate an ideal partner.

[0438] The following describes the processing flow.

[0439] Step 1:

[0440] Users input information about the appearance and personality of their ideal character through the device's interface. This includes specific characteristics and choices.

[0441] Step 2:

[0442] The terminal sends the information entered by the user to the server. This data becomes the raw material used in the character generation process.

[0443] Step 3:

[0444] The server analyzes the received data and generates an image of the character using an image generation algorithm. During this process, the character's features are reflected in the depiction.

[0445] Step 4:

[0446] The device displays the generated image to the user, simultaneously activating its natural language processing engine and preparing for interaction with the character. The user can then begin a conversation with the displayed character.

[0447] Step 5:

[0448] When a user enters a question or comment for a character, that input is sent to the server in real time via the device.

[0449] Step 6:

[0450] The server uses natural language processing algorithms to analyze the received user input and generate an appropriate response. During this process, it also refers to past conversation logs to consider the context and user characteristics.

[0451] Step 7:

[0452] The generated response is sent from the server to the terminal and displayed to the user. The user can then continue the conversation based on this response.

[0453] Step 8:

[0454] The server continuously records conversation history and uses this data to improve the dialogue model. The model evolves to match the user's tendencies and preferences, providing more natural and personalized conversations.

[0455] (Example 1)

[0456] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0457] With the advancements in information processing technology today, there is a growing demand to accurately reproduce the visual and emotional characteristics of a target desired by the user, and to build natural dialogues with that target. Therefore, the challenge lies in realizing a dialogue system that is intuitive for users to operate and provides personalized responses.

[0458] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0459] In this invention, the server includes means for providing an information processing device for the user to input the appearance and personality of an ideal object; means for generating a visual representation of the object using a visual representation generation method based on the input information; and means for activating a language processing engine that displays the generated visual representation and enables conversation with the user. This makes it possible to provide the user with an intuitive and personalized conversational experience and to efficiently link the entire process of visual representation and conversation.

[0460] An "information processing device" is a device that has the function of providing an interface for users to input the characteristics of a target they desire.

[0461] A "visual representation generation method" is a mathematical or algorithmic technique for generating a visual representation of an object based on input information.

[0462] A "language processing engine" is a software configuration that analyzes input language data and generates appropriate responses in order to enable natural conversation with the user.

[0463] A "storage device" is a device that stores digital data and retrieves or transmits that data as needed.

[0464] A "language processing method" is a process or technique for analyzing natural language and generating an appropriate response to the input.

[0465] A "conversation model" is a digital construct that manages interactions with users and improves the accuracy and adaptability of responses through learning.

[0466] "Opinions" are responses based on user feedback and expectations, and are used as information for system learning and improvement.

[0467] One embodiment of the present invention begins with a user accessing an interface via an information processing device and inputting the appearance and characteristics of an ideal target. The user then inputs attribute information for processing in a generative AI model. This information is received by the terminal and transmitted to the server.

[0468] The server generates a visual representation of the target based on the received data using a visual representation generation method. Existing image generation software such as Stable Diffusion and DALL-E can be applied to this generation. Specifically, the server's algorithm feeds data into the generation AI model using prompt statements. An example of a prompt statement would be, "Please generate a character with blue hair and a cheerful personality."

[0469] The generated visual representation is sent from the server to the terminal and displayed on the user's screen. This allows the user to confirm the visual elements. Subsequently, the language processing engine is automatically activated on the server, enabling the user to continue a natural conversation with the generated subject.

[0470] Users input questions and comments in text format through their devices, which are sent to the server in real time. The server analyzes this user input using natural language processing methods and generates appropriate responses. The analysis utilizes natural language processing models such as OpenAI's GPT-3, taking into account past conversation history to construct responses tailored to the individual user's context.

[0471] Furthermore, the server saves the entire conversation history and uses it as training data for the language model being used. This process allows the system to improve its dialogue model over time, providing a more appropriate and sophisticated conversational experience. For example, if a user says, "I'm feeling down, can you recommend a movie that will cheer me up?", the system can use past data to suggest a movie of a suitable genre that matches the user's preferences.

[0472] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0473] Step 1:

[0474] Users access the interface through their device and input information about the appearance and personality of their ideal partner. This input includes specific attributes such as hair color, eye shape, and personality type. The input data is structured by the device and sent to the server in JSON format.

[0475] Step 2:

[0476] The server analyzes the user input received from the terminal and initiates the image generation process using a visual representation generation method. A generation AI model is used for this process. The server generates prompt statements and inputs data into the image generation software (e.g., the image generation model) based on these statements. The resulting output is a visual representation of a character based on the user's specified attributes.

[0477] Step 3:

[0478] After the visual representation of the character is generated, the image data is sent from the server to the terminal. The terminal then displays this image on the user's screen. As a result, the user can review the visual representation and decide whether or not they are satisfied.

[0479] Step 4:

[0480] The server starts the language processing engine, and the user can begin a conversation with the generated character. The user enters questions and comments in text format through the terminal. This input is sent to the server in real time.

[0481] Step 5:

[0482] The server receives input text from the user and analyzes it using natural language processing methods. The natural language processing model used evaluates the meaning of the input text and generates an appropriate response that is relevant to the context. In doing so, it also refers to past conversation history to construct a more contextually relevant response.

[0483] Step 6:

[0484] The generated response is sent from the server to the terminal and displayed on the user's screen. This allows the user to continue the conversation with the character. After the response is displayed, the terminal records this conversation history, and the server uses this to improve the conversation model. In this way, the system learns over time, enabling more accurate conversations.

[0485] (Application Example 1)

[0486] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0487] In today's world, artificial intelligence that supports users' lives is becoming increasingly important. However, existing systems struggle to generate the ideal character that users desire and provide support in various aspects of daily life through natural dialogue. Therefore, there is a need for technology that allows users to intuitively operate an interface and realize a character that closely resembles their ideal partner.

[0488] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0489] In this invention, the server includes means for providing a terminal for the user to input the appearance and personality of their ideal character; means for generating a visual representation of the character using an AI model based on the input information; and means for displaying the generated character and activating a natural language processing engine for natural language interaction with the user. This enables the user to intuitively create their ideal character and receive support in their daily life.

[0490] A "user" is someone who inputs their ideal character via a device and enjoys interacting with it.

[0491] A "terminal" is a device used by users to set the appearance and personality of a character and to interact with them.

[0492] An "AI model" is an artificial intelligence technology used to generate a visual representation of a character based on input information.

[0493] A "character" is a personal being with a visual representation that is generated based on the user's ideals.

[0494] "Visual representation" refers to images or animations that show the appearance of the generated character.

[0495] A "natural language processing engine" is a program designed to enable natural language interaction with users.

[0496] A "dialogue record" is data that records the content of conversations between the user and the character.

[0497] An "information processing device" is a computer system that receives and analyzes dialogue records.

[0498] A "natural language processing algorithm" is a method for analyzing input language data and generating an appropriate response.

[0499] "Optimization" is the process of improving a dialogue model using saved dialogue history.

[0500] To implement this invention, the user uses a dedicated terminal to input information about the appearance and personality of their ideal character through an interface. The terminal then transmits this user input information to a server.

[0501] The server generates a visual representation of the character using an AI model based on the received information. Specifically, a generative AI model is used to create realistic character images according to the attributes specified by the user. This entire process utilizes natural language processing algorithms and image processing software.

[0502] The generated character is displayed on the terminal, and the natural language processing engine is activated. This allows the user to begin a natural and interactive conversation with the character. The server records the conversation in real time and performs detailed analysis on an information processing device.

[0503] As a concrete example, if a user asks a character, "Teach me some relaxing yoga poses," the character will consider the user's past requests and preferences to suggest yoga poses and related actions. This suggestion utilizes past conversation logs to generate the most suitable response for the user.

[0504] An example of a prompt to input into the generating AI model is: "The user has input 'Please teach me some relaxing yoga poses.' Based on the user's preferred relaxation methods from past conversation history, please suggest appropriate yoga poses." This allows users to receive support in their daily lives from an ideal character.

[0505] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0506] Step 1:

[0507] The user inputs information about the ideal character's appearance and personality into the interface via their device. The entered information is temporarily stored on the device as the character's attribute data.

[0508] Step 2:

[0509] The terminal sends temporarily stored character attribute data to the server. This data transmission allows the server to prepare for character generation based on the user's request.

[0510] Step 3:

[0511] The server receives the character attribute data as input, calls a generation AI model, and begins data processing. This model analyzes the attribute data and generates a visual representation of the character according to the specified features. The generated character image is output to the server.

[0512] Step 4:

[0513] The server sends the generated character image to the terminal. The terminal receives it and displays the image on its screen. The user can then see the visualized character.

[0514] Step 5:

[0515] The server starts up its natural language processing engine and prepares to interact with the user. When the terminal receives text input from the user, that text data is sent to the server.

[0516] Step 6:

[0517] The server analyzes the received text data using a natural language processing algorithm. It also refers to past dialogue history to understand the context. Based on the analysis, it generates an appropriate response and outputs it in text format.

[0518] Step 7:

[0519] The server sends the generated response to the terminal. The terminal receives this response and displays it back to the user. The user can then review the character's response and continue the conversation.

[0520] Step 8:

[0521] The server continuously records conversation logs and stores the conversation history in an information processing device. This creates a data foundation that allows for a deeper understanding of user preferences and context in future conversations.

[0522] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0523] This invention is a system that allows users to generate their ideal character and engage in natural and intimate conversations with that character. Furthermore, it incorporates an emotion engine that understands the user's emotions and generates corresponding responses, thereby enhancing engagement.

[0524] To implement this system, the terminal first provides an interface to the user, where the user inputs detailed information about the character's appearance and personality. The input information is immediately sent from the terminal to the server, which uses an image generation algorithm to generate a character image. The generated character is then displayed to the user on the terminal.

[0525] Next, a conversation with the user is initiated using a natural language processing engine, but this is where the emotion engine comes in. The emotion engine analyzes the user's input text and data from voice tone and keystrokes during the conversation to estimate the user's emotional state. Based on this emotional state, the server generates and provides the optimal response to the user, thus designing the conversation to have greater depth.

[0526] For example, if a user tells a character, "I'm very tired today," the emotion engine will determine that the user is tired based on words like "tired" and "very." Based on this information, the server will generate a gentle response such as, "Take it easy today and do something you enjoy," and convey it to the user via the device. In this way, a more personalized experience is realized through dialogue that is tailored to the user's emotions.

[0527] The server saves dialogue content and emotional data as conversation history, and uses this to improve the dialogue model. This approach allows the system to provide more user-friendly dialogue the more it is used, continuously offering users a simulation experience with the most suitable partner.

[0528] The following describes the processing flow.

[0529] Step 1:

[0530] Users use the device's interface to input details about the ideal character's appearance and personality. This information includes specific looks and personality traits.

[0531] Step 2:

[0532] The terminal sends the information entered by the user to the server. The server receives this information, activates an image generation algorithm, and generates an image of a character based on the specified features.

[0533] Step 3:

[0534] The generated character image is sent to the device and displayed on the user's screen. The device simultaneously activates its natural language processing engine, preparing to interact with the user.

[0535] Step 4:

[0536] Users send messages to characters to initiate conversations. This input is processed in real time and sent from the terminal to the server.

[0537] Step 5:

[0538] The server receives user input and performs sentiment analysis using an emotion engine. For example, it analyzes keywords in the context and input methods (speed, frequency) to recognize the user's emotions.

[0539] Step 6:

[0540] The server uses emotion data recognized by the emotion engine and natural language processing algorithms to generate appropriate responses that match the user's emotions. In doing so, it also refers to conversation history and past emotion logs to construct a dialogue optimized for the user.

[0541] Step 7:

[0542] The generated response is sent to the device and displayed to the user. The user can then use this response to continue the conversation.

[0543] Step 8:

[0544] The server records all dialogue and emotional data, continuously improving the dialogue model. This allows for more relatable and emotionally nuanced dialogues for the user over time.

[0545] (Example 2)

[0546] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0547] Current conversational systems have limited ability to fully understand individual user emotions and characteristics and generate personalized responses. This results in a uniform user experience and a failure to provide sufficiently personalized dialogue. Furthermore, they are inadequate in generating responses that reflect the user's emotional state, lacking depth and intimacy in conversations.

[0548] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0549] In this invention, the server includes means for providing an interface for the user to input the appearance and characteristics of an ideal object; means for generating an image of the object using image generation technology based on the input information; and means for using emotion analysis technology to analyze the user's input, voice tone, and operation data to estimate their emotional state. This enables personalized, emotion-sensitive dialogue, resulting in a rich and intimate experience for the user.

[0550] An "interface" is a means for a user to interact with a system and information in a two-way manner, and is part of the user experience that enables visual or physical interaction.

[0551] "Image generation technology" refers to algorithms and processes that create visual information based on input data, and is a technology for producing computer-generated images.

[0552] "Language processing technology" refers to techniques for understanding, generating, and analyzing human language on a computer, and is a method that enables advanced dialogue using natural language.

[0553] "Emotion analysis technology" is an analytical method for estimating a user's emotional state from their input data, and it is a technology that quantifies or categorizes the user's emotions and reflects them in the response.

[0554] A "dialogue model" is a collection of data structures and algorithms for managing interactions with users and generating optimal responses; it is the foundation for controlling the flow and content of dialogue.

[0555] "Feedback" refers to information about user responses and evaluations of the system, and is part of the information used to improve the system and enhance its accuracy.

[0556] This invention is a system that generates a character desired by the user and allows for intimate interaction with that character. Specifically, this system is implemented according to the following procedure.

[0557] The terminal provides the user with an interface for character creation. This interface allows the user to input detailed information about the character's appearance and attributes. The terminal sends this data to a server. The server uses image generation technology based on the input information to generate an image of the character. The image generation technology used here typically involves generative AI models such as Stable Diffusion or DALL-E.

[0558] The server sends the generated character image to the terminal, which then displays the image to the user. The user can then review the generated character and make further adjustments through the interface if necessary.

[0559] Furthermore, the device utilizes natural language processing technology to initiate a dialogue between the character and the user. During this process, the server analyzes the user's input text, voice tone, and interaction data through sentiment analysis technology. This allows the server to estimate the user's emotional state and provide a personalized response based on that estimate.

[0560] For example, when a user tells a character, "I'm very tired today," the server's emotion analysis technology determines that the user is tired based on words like "tired." Based on this information, the server generates a gentle response such as, "Take it easy today and do something you enjoy," and conveys it to the user via the device. This provides the user with a more travel-like and relatable dialogue.

[0561] An example of a prompt is, "The user has told the character that they are very tired. How would you respond?" This is used in the response generation process to understand and reflect the user's intent.

[0562] This system is implemented using a combination of computer hardware and associated software. Specific hardware requirements include a terminal with a standard internet connection, display, and input devices. This enables efficient character generation using generative AI models and allows for deep, natural dialogue.

[0563] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0564] Step 1:

[0565] The device provides the user with a character creation interface. The user uses this interface to input information such as the character's appearance and personality. This input data is divided into multiple fields, allowing for detailed specification using sliders and dropdown menus. The user's input information is formatted as JSON data.

[0566] Step 2:

[0567] The terminal sends the information entered by the user to the server. The server analyzes this JSON-formatted input data to extract the parameters necessary for the image generation algorithm. The input includes the character's appearance elements and characteristics. Using this data, the server starts the generation AI model.

[0568] Step 3:

[0569] The server generates character images using image generation technology. Generative AI models such as Stable Diffusion and DALL-E are used here. This process involves detailed data processing and calculations based on input parameters to synthesize an image that closely matches the specified result. The output is the generated character image data.

[0570] Step 4:

[0571] The server sends the generated character image data to the terminal. The terminal displays the received image data to the user. The user can review this image and make additional modifications as needed. At this stage, it is possible to view the image in high resolution using an image viewer.

[0572] Step 5:

[0573] The terminal uses natural language processing to initiate a dialogue between the user and the character. Text input from the user is received through the interface, triggering the dialogue. The entered text data is then sent to the server.

[0574] Step 6:

[0575] The server activates its sentiment analysis engine based on the received text data to estimate the user's emotional state. The server analyzes the sentiment indicators within the text and generates numerical or categorized sentiment data. This data is used as a reference for generating responses.

[0576] Step 7:

[0577] The server generates responses that reflect the user's emotional state. Using existing language processing models, it generates appropriate replies based on the user's emotions and a generative AI model. This response data is refined to aim for natural and personalized dialogue.

[0578] Step 8:

[0579] The generated response is sent from the server to the terminal. The terminal displays the response to the user and prompts for further interaction. The interaction history and sentiment data are stored on the server and used later to improve the system.

[0580] (Application Example 2)

[0581] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0582] Conventional character generation systems have suffered from a decline in the quality of dialogue because they do not take into account the emotional elements in user interactions. Furthermore, their inability to immediately respond to changes in the user's emotional state has limited their potential for use in family communication and entertainment. This invention aims to solve the technical challenges of understanding user emotions and realizing more intimate and personal communication.

[0583] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0584] This invention includes a server that uses an emotion analysis engine to analyze data on the tone of the user's voice and input actions to estimate their emotional state; a server that uses a real-time generated character assistant to provide added value for deepening home interactions; and a server that stores conversation history and improves the dialogue model through learning. This makes it possible to make the interaction between the user and the character assistant more natural and intimate, and to enhance in-home entertainment and communication.

[0585] An "interface" is a means for a user to exchange information with a system, and it is possible to input information such as appearance and personality.

[0586] An "image generation algorithm" is a computational method for automatically generating visual characters based on input information.

[0587] A "natural language processing engine" is software that has the ability to understand user input and generate appropriate responses.

[0588] A "sentiment analysis engine" is a program that determines the user's emotional state based on their tone of voice and input actions.

[0589] A "character assistant" is a virtual entity that functions as an aid in conversations and communicates with the user.

[0590] A "household robot" is a mechanical device that can perform various tasks in a home environment and interact with the inhabitants.

[0591] "Conversation history" refers to data that the system has recorded and saved, containing all past interactions with the user.

[0592] A "dialogue model" is a collection of algorithms used to design the flow of interactions with users and provide optimal communication.

[0593] To implement this invention, the user must first input details of the desired character's appearance and personality using a dedicated interface. The terminal receives this information and sends it to the server. The server uses an image generation algorithm to generate a visual representation of the character based on the input information. Image generation libraries such as Stable Diffusion and DALL-E can be utilized in this process.

[0594] After the character is generated, it is displayed to the user on the device, and at this time, the natural language processing engine is activated. This engine receives text and voice input from the user, analyzes the content, and generates an appropriate response. For natural language processing, libraries such as NLTK and spaCy can be used, and the IBM Watson Tone Analyzer can be used as the emotion engine. This makes it possible to estimate the user's emotional state based on voice tone and input characteristics.

[0595] Furthermore, the character assistant operates as a home robot, providing enjoyment through interaction with the user. This robot uses hardware such as smart speakers, microphones, and speakers to engage in voice interaction, recording the user's responses for the next interaction and sending them to a server. The server improves the dialogue model based on the accumulated conversation history, providing more personalized responses to the user.

[0596] For example, if a child tells a home robot, "I couldn't play with my friends today," the system will determine from the child's words and tone of voice that the child is feeling a little lonely. It will then suggest something like, "How about we do some drawing together today to refresh ourselves?" This allows the child to find something to enjoy without feeling lonely. An example of a prompt to the generative AI model in this case might be, "The user may be feeling lonely. What would be an appropriate response?"

[0597] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0598] Step 1:

[0599] The user inputs the appearance and personality of their ideal character through the interface. The entered data is stored on the device as detailed information about the character's appearance and personality. The device then sends this input information to the server.

[0600] Step 2:

[0601] The server generates a visual representation of the character using an image generation algorithm (e.g., Stable Diffusion or DALL-E) based on the received information. The input is the character features specified by the user, and the output is the generated character image. Here, data processing involves calculations to convert the input text information into an image.

[0602] Step 3:

[0603] The terminal receives the generated character image sent from the server and displays it to the user. The user can then begin a visual interaction with the generated character on the terminal. Visual feedback is provided to the user during this step.

[0604] Step 4:

[0605] The user initiates a conversation with the character using text or voice. The terminal collects this input data and sends it to a natural language processing engine. The input is the user's voice or text data, and the output is data converted into natural language.

[0606] Step 5:

[0607] The server analyzes the received language data using a natural language processing engine. In conjunction with this, an emotion analysis engine is also used to estimate the user's emotional state. Specifically, voice tone and keystroke patterns are input, and data calculations are performed to output the user's emotion (e.g., joy, sadness).

[0608] Step 6:

[0609] Based on the analysis results, the server generates a corresponding response using an AI model. Here, the response content is determined using a prompt. The input is the analyzed emotion and dialogue content, and the output is an appropriate response to the user.

[0610] Step 7:

[0611] The generated response is sent to the user's device and provided to the user through the device by being displayed or spoken aloud. This allows the user to continue interacting with the character.

[0612] Step 8:

[0613] The terminal sends the entire conversation history with the user to the server, which uses this data to improve the conversation model. The input is the past conversation history, and by analyzing this data, the output is an improvement in the overall conversation quality of the system.

[0614] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0615] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0616] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0617] [Fourth Embodiment]

[0618] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0619] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0620] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0621] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0622] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0623] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0624] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0625] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0626] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0627] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0628] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0629] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0630] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0631] To implement this invention, the overall system architecture must first be constructed. The user uses an interface via a terminal to input the appearance and personality traits of their ideal character. The terminal then transmits this information to the server.

[0632] The server activates an image generation algorithm based on the received attribute information, generating a visual representation of the character according to the specified features. This image generation is achieved using generative modeling technology, creating a realistic character that matches the settings.

[0633] The generated character is displayed on the device, and simultaneously, a natural language processing (NLP) engine is activated, enabling interaction with the user. The user can then ask the character questions or initiate a conversation. The device records the user's input in real time and sends it to the server.

[0634] The server analyzes the user's input and generates an appropriate response using a natural language processing algorithm. This algorithm also refers to past conversation history to understand the context and construct a response that aligns with the user's expectations. The generated response is then sent back to the terminal and displayed to the user, continuing this cyclical process.

[0635] Through this interaction, the server continuously records conversation logs and uses them to learn. The user-specific dialogue model improves over time, enabling conversations with a deeper understanding. For example, if a user asks, "Can you recommend a movie that will help me unwind after a long day?", the server can refer to the user's hobbies and preferences recorded in previous conversations and suggest movies of an appropriate genre.

[0636] This process allows the present invention to provide users with a natural and intimate conversational experience, enabling them to simulate an ideal partner.

[0637] The following describes the processing flow.

[0638] Step 1:

[0639] Users input information about the appearance and personality of their ideal character through the device's interface. This includes specific characteristics and choices.

[0640] Step 2:

[0641] The terminal sends the information entered by the user to the server. This data becomes the raw material used in the character generation process.

[0642] Step 3:

[0643] The server analyzes the received data and generates an image of the character using an image generation algorithm. During this process, the character's features are reflected in the depiction.

[0644] Step 4:

[0645] The device displays the generated image to the user, simultaneously activating its natural language processing engine and preparing for interaction with the character. The user can then begin a conversation with the displayed character.

[0646] Step 5:

[0647] When a user enters a question or comment for a character, that input is sent to the server in real time via the device.

[0648] Step 6:

[0649] The server uses natural language processing algorithms to analyze the received user input and generate an appropriate response. During this process, it also refers to past conversation logs to consider the context and user characteristics.

[0650] Step 7:

[0651] The generated response is sent from the server to the terminal and displayed to the user. The user can then continue the conversation based on this response.

[0652] Step 8:

[0653] The server continuously records conversation history and uses this data to improve the dialogue model. The model evolves to match the user's tendencies and preferences, providing more natural and personalized conversations.

[0654] (Example 1)

[0655] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0656] With the advancements in information processing technology today, there is a growing demand to accurately reproduce the visual and emotional characteristics of a target desired by the user, and to build natural dialogues with that target. Therefore, the challenge lies in realizing a dialogue system that is intuitive for users to operate and provides personalized responses.

[0657] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0658] In this invention, the server includes means for providing an information processing device for the user to input the appearance and personality of an ideal object; means for generating a visual representation of the object using a visual representation generation method based on the input information; and means for activating a language processing engine that displays the generated visual representation and enables conversation with the user. This makes it possible to provide the user with an intuitive and personalized conversational experience and to efficiently link the entire process of visual representation and conversation.

[0659] An "information processing device" is a device that has the function of providing an interface for users to input the characteristics of a target they desire.

[0660] A "visual representation generation method" is a mathematical or algorithmic technique for generating a visual representation of an object based on input information.

[0661] A "language processing engine" is a software configuration that analyzes input language data and generates appropriate responses in order to enable natural conversation with the user.

[0662] A "storage device" is a device that stores digital data and retrieves or transmits that data as needed.

[0663] A "language processing method" is a process or technique for analyzing natural language and generating an appropriate response to the input.

[0664] A "conversation model" is a digital construct that manages interactions with users and improves the accuracy and adaptability of responses through learning.

[0665] "Opinions" are responses based on user feedback and expectations, and are used as information for system learning and improvement.

[0666] One embodiment of the present invention begins with a user accessing an interface via an information processing device and inputting the appearance and characteristics of an ideal target. The user then inputs attribute information for processing in a generative AI model. This information is received by the terminal and transmitted to the server.

[0667] The server generates a visual representation of the target based on the received data using a visual representation generation method. Existing image generation software such as Stable Diffusion and DALL-E can be applied to this generation. Specifically, the server's algorithm feeds data into the generation AI model using prompt statements. An example of a prompt statement would be, "Please generate a character with blue hair and a cheerful personality."

[0668] The generated visual representation is sent from the server to the terminal and displayed on the user's screen. This allows the user to confirm the visual elements. Subsequently, the language processing engine is automatically activated on the server, enabling the user to continue a natural conversation with the generated subject.

[0669] Users input questions and comments in text format through their devices, which are sent to the server in real time. The server analyzes this user input using natural language processing methods and generates appropriate responses. The analysis utilizes natural language processing models such as OpenAI's GPT-3, taking into account past conversation history to construct responses tailored to the individual user's context.

[0670] Furthermore, the server saves the entire conversation history and uses it as training data for the language model being used. This process allows the system to improve its dialogue model over time, providing a more appropriate and sophisticated conversational experience. For example, if a user says, "I'm feeling down, can you recommend a movie that will cheer me up?", the system can use past data to suggest a movie of a suitable genre that matches the user's preferences.

[0671] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0672] Step 1:

[0673] Users access the interface through their device and input information about the appearance and personality of their ideal partner. This input includes specific attributes such as hair color, eye shape, and personality type. The input data is structured by the device and sent to the server in JSON format.

[0674] Step 2:

[0675] The server analyzes the user input received from the terminal and initiates the image generation process using a visual representation generation method. A generation AI model is used for this process. The server generates prompt statements and inputs data into the image generation software (e.g., the image generation model) based on these statements. The resulting output is a visual representation of a character based on the user's specified attributes.

[0676] Step 3:

[0677] After the visual representation of the character is generated, the image data is sent from the server to the terminal. The terminal then displays this image on the user's screen. As a result, the user can review the visual representation and decide whether or not they are satisfied.

[0678] Step 4:

[0679] The server starts the language processing engine, and the user can begin a conversation with the generated character. The user enters questions and comments in text format through the terminal. This input is sent to the server in real time.

[0680] Step 5:

[0681] The server receives input text from the user and analyzes it using natural language processing methods. The natural language processing model used evaluates the meaning of the input text and generates an appropriate response that is relevant to the context. In doing so, it also refers to past conversation history to construct a more contextually relevant response.

[0682] Step 6:

[0683] The generated response is sent from the server to the terminal and displayed on the user's screen. This allows the user to continue the conversation with the character. After the response is displayed, the terminal records this conversation history, and the server uses this to improve the conversation model. In this way, the system learns over time, enabling more accurate conversations.

[0684] (Application Example 1)

[0685] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0686] In today's world, artificial intelligence that supports users' lives is becoming increasingly important. However, existing systems struggle to generate the ideal character that users desire and provide support in various aspects of daily life through natural dialogue. Therefore, there is a need for technology that allows users to intuitively operate an interface and realize a character that closely resembles their ideal partner.

[0687] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0688] In this invention, the server includes means for providing a terminal for the user to input the appearance and personality of their ideal character; means for generating a visual representation of the character using an AI model based on the input information; and means for displaying the generated character and activating a natural language processing engine for natural language interaction with the user. This enables the user to intuitively create their ideal character and receive support in their daily life.

[0689] A "user" is someone who inputs their ideal character via a device and enjoys interacting with it.

[0690] A "terminal" is a device used by users to set the appearance and personality of a character and to interact with them.

[0691] An "AI model" is an artificial intelligence technology used to generate a visual representation of a character based on input information.

[0692] A "character" is a personal being with a visual representation that is generated based on the user's ideals.

[0693] "Visual representation" refers to images or animations that show the appearance of the generated character.

[0694] A "natural language processing engine" is a program designed to enable natural language interaction with users.

[0695] A "dialogue record" is data that records the content of conversations between the user and the character.

[0696] An "information processing device" is a computer system that receives and analyzes dialogue records.

[0697] A "natural language processing algorithm" is a method for analyzing input language data and generating an appropriate response.

[0698] "Optimization" is the process of improving a dialogue model using saved dialogue history.

[0699] To implement this invention, the user uses a dedicated terminal to input information about the appearance and personality of their ideal character through an interface. The terminal then transmits this user input information to a server.

[0700] The server generates a visual representation of the character using an AI model based on the received information. Specifically, a generative AI model is used to create realistic character images according to the attributes specified by the user. This entire process utilizes natural language processing algorithms and image processing software.

[0701] The generated character is displayed on the terminal, and the natural language processing engine is activated. This allows the user to begin a natural and interactive conversation with the character. The server records the conversation in real time and performs detailed analysis on an information processing device.

[0702] As a concrete example, if a user asks a character, "Teach me some relaxing yoga poses," the character will consider the user's past requests and preferences to suggest yoga poses and related actions. This suggestion utilizes past conversation logs to generate the most suitable response for the user.

[0703] An example of a prompt to input into the generating AI model is: "The user has input 'Please teach me some relaxing yoga poses.' Based on the user's preferred relaxation methods from past conversation history, please suggest appropriate yoga poses." This allows users to receive support in their daily lives from an ideal character.

[0704] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0705] Step 1:

[0706] The user inputs information about the ideal character's appearance and personality into the interface via their device. The entered information is temporarily stored on the device as the character's attribute data.

[0707] Step 2:

[0708] The terminal sends temporarily stored character attribute data to the server. This data transmission allows the server to prepare for character generation based on the user's request.

[0709] Step 3:

[0710] The server receives the character attribute data as input, calls a generation AI model, and begins data processing. This model analyzes the attribute data and generates a visual representation of the character according to the specified features. The generated character image is output to the server.

[0711] Step 4:

[0712] The server sends the generated character image to the terminal. The terminal receives it and displays the image on its screen. The user can then see the visualized character.

[0713] Step 5:

[0714] The server starts up its natural language processing engine and prepares to interact with the user. When the terminal receives text input from the user, that text data is sent to the server.

[0715] Step 6:

[0716] The server analyzes the received text data using a natural language processing algorithm. It also refers to past dialogue history to understand the context. Based on the analysis, it generates an appropriate response and outputs it in text format.

[0717] Step 7:

[0718] The server sends the generated response to the terminal. The terminal receives this response and displays it back to the user. The user can then review the character's response and continue the conversation.

[0719] Step 8:

[0720] The server continuously records conversation logs and stores the conversation history in an information processing device. This creates a data foundation that allows for a deeper understanding of user preferences and context in future conversations.

[0721] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0722] This invention is a system that allows users to generate their ideal character and engage in natural and intimate conversations with that character. Furthermore, it incorporates an emotion engine that understands the user's emotions and generates corresponding responses, thereby enhancing engagement.

[0723] To implement this system, the terminal first provides an interface to the user, where the user inputs detailed information about the character's appearance and personality. The input information is immediately sent from the terminal to the server, which uses an image generation algorithm to generate a character image. The generated character is then displayed to the user on the terminal.

[0724] Next, a conversation with the user is initiated using a natural language processing engine, but this is where the emotion engine comes in. The emotion engine analyzes the user's input text and data from voice tone and keystrokes during the conversation to estimate the user's emotional state. Based on this emotional state, the server generates and provides the optimal response to the user, thus designing the conversation to have greater depth.

[0725] For example, if a user tells a character, "I'm very tired today," the emotion engine will determine that the user is tired based on words like "tired" and "very." Based on this information, the server will generate a gentle response such as, "Take it easy today and do something you enjoy," and convey it to the user via the device. In this way, a more personalized experience is realized through dialogue that is tailored to the user's emotions.

[0726] The server saves dialogue content and emotional data as conversation history, and uses this to improve the dialogue model. This approach allows the system to provide more user-friendly dialogue the more it is used, continuously offering users a simulation experience with the most suitable partner.

[0727] The following describes the processing flow.

[0728] Step 1:

[0729] Users use the device's interface to input details about the ideal character's appearance and personality. This information includes specific looks and personality traits.

[0730] Step 2:

[0731] The terminal sends the information entered by the user to the server. The server receives this information, activates an image generation algorithm, and generates an image of a character based on the specified features.

[0732] Step 3:

[0733] The generated character image is sent to the device and displayed on the user's screen. The device simultaneously activates its natural language processing engine, preparing to interact with the user.

[0734] Step 4:

[0735] Users send messages to characters to initiate conversations. This input is processed in real time and sent from the terminal to the server.

[0736] Step 5:

[0737] The server receives user input and performs sentiment analysis using an emotion engine. For example, it analyzes keywords in the context and input methods (speed, frequency) to recognize the user's emotions.

[0738] Step 6:

[0739] The server uses emotion data recognized by the emotion engine and natural language processing algorithms to generate appropriate responses that match the user's emotions. In doing so, it also refers to conversation history and past emotion logs to construct a dialogue optimized for the user.

[0740] Step 7:

[0741] The generated response is sent to the device and displayed to the user. The user can then use this response to continue the conversation.

[0742] Step 8:

[0743] The server records all dialogue and emotional data, continuously improving the dialogue model. This allows for more relatable and emotionally nuanced dialogues for the user over time.

[0744] (Example 2)

[0745] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0746] Current conversational systems have limited ability to fully understand individual user emotions and characteristics and generate personalized responses. This results in a uniform user experience and a failure to provide sufficiently personalized dialogue. Furthermore, they are inadequate in generating responses that reflect the user's emotional state, lacking depth and intimacy in conversations.

[0747] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0748] In this invention, the server includes means for providing an interface for the user to input the appearance and characteristics of an ideal object; means for generating an image of the object using image generation technology based on the input information; and means for using emotion analysis technology to analyze the user's input, voice tone, and operation data to estimate their emotional state. This enables personalized, emotion-sensitive dialogue, resulting in a rich and intimate experience for the user.

[0749] An "interface" is a means for a user to interact with a system and information in a two-way manner, and is part of the user experience that enables visual or physical interaction.

[0750] "Image generation technology" refers to algorithms and processes that create visual information based on input data, and is a technology for producing computer-generated images.

[0751] "Language processing technology" refers to techniques for understanding, generating, and analyzing human language on a computer, and is a method that enables advanced dialogue using natural language.

[0752] "Emotion analysis technology" is an analytical method for estimating a user's emotional state from their input data, and it is a technology that quantifies or categorizes the user's emotions and reflects them in the response.

[0753] A "dialogue model" is a collection of data structures and algorithms for managing interactions with users and generating optimal responses; it is the foundation for controlling the flow and content of dialogue.

[0754] "Feedback" refers to information about user responses and evaluations of the system, and is part of the information used to improve the system and enhance its accuracy.

[0755] This invention is a system that generates a character desired by the user and allows for intimate interaction with that character. Specifically, this system is implemented according to the following procedure.

[0756] The terminal provides the user with an interface for character creation. This interface allows the user to input detailed information about the character's appearance and attributes. The terminal sends this data to a server. The server uses image generation technology based on the input information to generate an image of the character. The image generation technology used here typically involves generative AI models such as Stable Diffusion or DALL-E.

[0757] The server sends the generated character image to the terminal, which then displays the image to the user. The user can then review the generated character and make further adjustments through the interface if necessary.

[0758] Furthermore, the device utilizes natural language processing technology to initiate a dialogue between the character and the user. During this process, the server analyzes the user's input text, voice tone, and interaction data through sentiment analysis technology. This allows the server to estimate the user's emotional state and provide a personalized response based on that estimate.

[0759] For example, when a user tells a character, "I'm very tired today," the server's emotion analysis technology determines that the user is tired based on words like "tired." Based on this information, the server generates a gentle response such as, "Take it easy today and do something you enjoy," and conveys it to the user via the device. This provides the user with a more travel-like and relatable dialogue.

[0760] An example of a prompt is, "The user has told the character that they are very tired. How would you respond?" This is used in the response generation process to understand and reflect the user's intent.

[0761] This system is implemented using a combination of computer hardware and associated software. Specific hardware requirements include a terminal with a standard internet connection, display, and input devices. This enables efficient character generation using generative AI models and allows for deep, natural dialogue.

[0762] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0763] Step 1:

[0764] The device provides the user with a character creation interface. The user uses this interface to input information such as the character's appearance and personality. This input data is divided into multiple fields, allowing for detailed specification using sliders and dropdown menus. The user's input information is formatted as JSON data.

[0765] Step 2:

[0766] The terminal sends the information entered by the user to the server. The server analyzes this JSON-formatted input data to extract the parameters necessary for the image generation algorithm. The input includes the character's appearance elements and characteristics. Using this data, the server starts the generation AI model.

[0767] Step 3:

[0768] The server generates character images using image generation technology. Generative AI models such as Stable Diffusion and DALL-E are used here. This process involves detailed data processing and calculations based on input parameters to synthesize an image that closely matches the specified result. The output is the generated character image data.

[0769] Step 4:

[0770] The server sends the generated character image data to the terminal. The terminal displays the received image data to the user. The user can review this image and make additional modifications as needed. At this stage, it is possible to view the image in high resolution using an image viewer.

[0771] Step 5:

[0772] The terminal uses natural language processing to initiate a dialogue between the user and the character. Text input from the user is received through the interface, triggering the dialogue. The entered text data is then sent to the server.

[0773] Step 6:

[0774] The server activates its sentiment analysis engine based on the received text data to estimate the user's emotional state. The server analyzes the sentiment indicators within the text and generates numerical or categorized sentiment data. This data is used as a reference for generating responses.

[0775] Step 7:

[0776] The server generates responses that reflect the user's emotional state. Using existing language processing models, it generates appropriate replies based on the user's emotions and a generative AI model. This response data is refined to aim for natural and personalized dialogue.

[0777] Step 8:

[0778] The generated response is sent from the server to the terminal. The terminal displays the response to the user and prompts for further interaction. The interaction history and sentiment data are stored on the server and used later to improve the system.

[0779] (Application Example 2)

[0780] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0781] Conventional character generation systems have suffered from a decline in the quality of dialogue because they do not take into account the emotional elements in user interactions. Furthermore, their inability to immediately respond to changes in the user's emotional state has limited their potential for use in family communication and entertainment. This invention aims to solve the technical challenges of understanding user emotions and realizing more intimate and personal communication.

[0782] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0783] This invention includes a server that uses an emotion analysis engine to analyze data on the tone of the user's voice and input actions to estimate their emotional state; a server that uses a real-time generated character assistant to provide added value for deepening home interactions; and a server that stores conversation history and improves the dialogue model through learning. This makes it possible to make the interaction between the user and the character assistant more natural and intimate, and to enhance in-home entertainment and communication.

[0784] An "interface" is a means for a user to exchange information with a system, and it is possible to input information such as appearance and personality.

[0785] An "image generation algorithm" is a computational method for automatically generating visual characters based on input information.

[0786] A "natural language processing engine" is software that has the ability to understand user input and generate appropriate responses.

[0787] A "sentiment analysis engine" is a program that determines the user's emotional state based on their tone of voice and input actions.

[0788] A "character assistant" is a virtual entity that functions as an aid in conversations and communicates with the user.

[0789] A "household robot" is a mechanical device that can perform various tasks in a home environment and interact with the inhabitants.

[0790] "Conversation history" refers to data that the system has recorded and saved, containing all past interactions with the user.

[0791] A "dialogue model" is a collection of algorithms used to design the flow of interactions with users and provide optimal communication.

[0792] To implement this invention, the user must first input details of the desired character's appearance and personality using a dedicated interface. The terminal receives this information and sends it to the server. The server uses an image generation algorithm to generate a visual representation of the character based on the input information. Image generation libraries such as Stable Diffusion and DALL-E can be utilized in this process.

[0793] After the character is generated, it is displayed to the user on the device, and at this time, the natural language processing engine is activated. This engine receives text and voice input from the user, analyzes the content, and generates an appropriate response. For natural language processing, libraries such as NLTK and spaCy can be used, and the IBM Watson Tone Analyzer can be used as the emotion engine. This makes it possible to estimate the user's emotional state based on voice tone and input characteristics.

[0794] Furthermore, the character assistant operates as a home robot, providing enjoyment through interaction with the user. This robot uses hardware such as smart speakers, microphones, and speakers to engage in voice interaction, recording the user's responses for the next interaction and sending them to a server. The server improves the dialogue model based on the accumulated conversation history, providing more personalized responses to the user.

[0795] For example, if a child tells a home robot, "I couldn't play with my friends today," the system will determine from the child's words and tone of voice that the child is feeling a little lonely. It will then suggest something like, "How about we do some drawing together today to refresh ourselves?" This allows the child to find something to enjoy without feeling lonely. An example of a prompt to the generative AI model in this case might be, "The user may be feeling lonely. What would be an appropriate response?"

[0796] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0797] Step 1:

[0798] The user inputs the appearance and personality of their ideal character through the interface. The entered data is stored on the device as detailed information about the character's appearance and personality. The device then sends this input information to the server.

[0799] Step 2:

[0800] The server generates a visual representation of the character using an image generation algorithm (e.g., Stable Diffusion or DALL-E) based on the received information. The input is the character features specified by the user, and the output is the generated character image. Here, data processing involves calculations to convert the input text information into an image.

[0801] Step 3:

[0802] The terminal receives the generated character image sent from the server and displays it to the user. The user can then begin a visual interaction with the generated character on the terminal. Visual feedback is provided to the user during this step.

[0803] Step 4:

[0804] The user initiates a conversation with the character using text or voice. The terminal collects this input data and sends it to a natural language processing engine. The input is the user's voice or text data, and the output is data converted into natural language.

[0805] Step 5:

[0806] The server analyzes the received language data using a natural language processing engine. In conjunction with this, an emotion analysis engine is also used to estimate the user's emotional state. Specifically, voice tone and keystroke patterns are input, and data calculations are performed to output the user's emotion (e.g., joy, sadness).

[0807] Step 6:

[0808] Based on the analysis results, the server generates a corresponding response using an AI model. Here, the response content is determined using a prompt. The input is the analyzed emotion and dialogue content, and the output is an appropriate response to the user.

[0809] Step 7:

[0810] The generated response is sent to the user's device and provided to the user through the device by being displayed or spoken aloud. This allows the user to continue interacting with the character.

[0811] Step 8:

[0812] The terminal sends the entire conversation history with the user to the server, which uses this data to improve the conversation model. The input is the past conversation history, and by analyzing this data, the output is an improvement in the overall conversation quality of the system.

[0813] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0814] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0815] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0816] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0817] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0818] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0819] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0820] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0821] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0822] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0823] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0824] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0825] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0826] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0827] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0828] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0829] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0830] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0831] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0832] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0833] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0834] The following is further disclosed regarding the embodiments described above.

[0835] (Claim 1)

[0836] A means of providing an interface for users to input the appearance and personality of the object they ideally desire,

[0837] A means for generating an image of a person using an image generation algorithm based on the input information,

[0838] A means for displaying the generated image and activating a natural language processing engine that enables natural dialogue with the user,

[0839] Means for recording the conversation with the user and sending it to the server,

[0840] A means for analyzing the recorded dialogue content and using a natural language processing algorithm to generate a corresponding response,

[0841] Means for sending and displaying the generated response to the user,

[0842] A system that includes means for saving conversation history and improving the dialogue model through learning.

[0843] (Claim 2)

[0844] The system according to claim 1, wherein the natural language processing algorithm understands the context from past conversation history and provides a more personalized response to the user.

[0845] (Claim 3)

[0846] The system according to claim 1, further comprising a function that allows the dialogue model to grow based on user feedback and provide a more user-friendly dialogue.

[0847] "Example 1"

[0848] (Claim 1)

[0849] A means for providing an information processing device for a user to input the appearance and personality of an ideal target,

[0850] A means for generating a visual representation of a target using a visual representation generation method based on the input information,

[0851] Means for displaying the generated visual representation and activating a language processing engine that enables conversation with the user,

[0852] Means for recording the conversation with the user and transmitting it to a storage device,

[0853] A means for analyzing the recorded conversation content and using a language processing method to generate a corresponding response,

[0854] Means for sending and displaying the generated response to the user,

[0855] A system that includes means for saving conversation history and improving the conversation model through information processing.

[0856] (Claim 2)

[0857] The system according to claim 1, wherein the language processing method understands the context from past conversation history and provides a more personalized response to the user.

[0858] (Claim 3)

[0859] The system according to claim 1, further comprising a function that develops the conversation model based on user feedback and provides a conversation that is more suitable for the user.

[0860] "Application Example 1"

[0861] (Claim 1)

[0862] A means of providing a terminal for users to input the appearance and personality of their ideal character,

[0863] A means for generating a visual representation of a character using an AI model based on the input information,

[0864] A means for displaying the generated character and activating a natural language processing engine for engaging in natural language dialogue with the user,

[0865] Means for recording the dialogue between the user and the robot companion and transmitting it to an information processing device,

[0866] A means for analyzing the recorded dialogue content and using a natural language processing algorithm to generate a corresponding response,

[0867] Means for sending and displaying the generated response to the user,

[0868] A system that includes means for saving dialogue history and improving the dialogue model through optimization.

[0869] (Claim 2)

[0870] The system according to claim 1, wherein the natural language processing algorithm grasps the context from past dialogue records and provides a more specific response to the user.

[0871] (Claim 3)

[0872] The system according to claim 1, wherein the dialogue model is improved based on user feedback and has a function to provide a more user-friendly dialogue.

[0873] "Example 2 of combining an emotion engine"

[0874] (Claim 1)

[0875] A means of providing an interface for users to input the appearance and characteristics of their ideal target,

[0876] A means for generating a target image using image generation technology based on the input information,

[0877] A means for displaying the generated image and activating language processing technology that enables natural interaction with the user,

[0878] A means of using emotion analysis technology to estimate the emotional state by analyzing user input, voice tone, and operation data,

[0879] Means for recording the conversation with the user and sending it to the server,

[0880] A means for analyzing the recorded dialogue content and using a language processing algorithm to generate a corresponding response,

[0881] Means for sending and displaying the generated response to the user,

[0882] A system that includes means for saving dialogue history and improving the dialogue model through learning.

[0883] (Claim 2)

[0884] The system according to claim 1, wherein the language processing technology understands the context from past dialogue history and provides a more personalized response to the user.

[0885] (Claim 3)

[0886] The system according to claim 1, further comprising a function that allows the dialogue model to grow based on user feedback and provide a more suitable dialogue for the user.

[0887] "Application example 2 when combining with an emotional engine"

[0888] (Claim 1)

[0889] A means of providing an interface for users to input the appearance and personality of the object they ideally desire,

[0890] A means for generating an image of a person using an image generation algorithm based on the input information,

[0891] A means for displaying the generated image and activating a natural language processing engine that enables natural dialogue with the user,

[0892] Means for recording the conversation with the user and sending it to the server,

[0893] A means for analyzing the recorded dialogue content and using a natural language processing algorithm to generate a corresponding response,

[0894] Means for sending and displaying the generated response to the user,

[0895] A means of saving conversation history and improving the dialogue model through learning,

[0896] A means of using an emotion analysis engine that analyzes the tone of the user's voice and input data to estimate their emotional state,

[0897] Providing added value to deepen interaction within the home, and using a real-time generated character assistant,

[0898] A system that includes this.

[0899] (Claim 2)

[0900] The system according to claim 1, wherein the natural language processing algorithm understands the context from past conversation history and provides a more personalized response to the user.

[0901] (Claim 3)

[0902] The system according to claim 1, further comprising a function that allows the dialogue model to grow based on user feedback and provide a more user-friendly dialogue. [Explanation of Symbols]

[0903] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of providing a terminal for users to input the appearance and personality of their ideal character, A means for generating a visual representation of a character using an AI model based on the input information, A means for displaying the generated character and activating a natural language processing engine for engaging in natural language dialogue with the user, Means for recording the dialogue between the user and the robot companion and transmitting it to an information processing device, A means for analyzing the recorded dialogue content and using a natural language processing algorithm to generate a corresponding response, Means for sending and displaying the generated response to the user, A system that includes means for saving dialogue history and improving the dialogue model through optimization.

2. The system according to claim 1, wherein the natural language processing algorithm grasps the context from past dialogue records and provides a more specific response to the user.

3. The system according to claim 1, wherein the dialogue model is improved based on user feedback and has a function to provide a more user-friendly dialogue.