system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses the challenge of personalized travel planning by analyzing user images to generate tailored travel plans and learning from feedback, ensuring optimized and evolving travel experiences.

JP2026100670APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Dec 2024

Application

19 Jun 2026

Publication

JP2026100670A

IPC: G06F16/9035; G06Q30/06; G06Q30/0601; G06Q30/016; G06Q30/0645; G06F16/58; G06F16/28; G06Q50/10; G06F16/735; G06Q30/0207; G06Q50/14; G06Q30/0272; G06Q30/0279; G06Q30/0283

AI Tagging

Application Domain

Buying/selling/leasing transactions Metadata still image retrieval

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026100670000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] An image acquisition means for obtaining user image information, An estimation means for estimating user preferences based on acquired image information, A generation means for generating travel plans based on estimated preferences, A presentation means for presenting the generated travel plan to the user, A learning method that learns from feedback information received from users, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] It is that travelers cannot easily obtain travel information optimized for their personal preferences. In the conventional method, it takes time and effort to plan a travel plan that suits one's own hobbies and needs, and investigating details in advance often impairs the charm of a new experience. Therefore, it is an issue to provide a fresh experience that matches individual preferences without making travel complicated.

Means for Solving the Problems

[0005] This invention employs a configuration that acquires image information from users, analyzes features from those images to estimate the user's preferences, and then automatically generates and presents a travel plan based on the estimation results. Furthermore, by collecting user feedback and utilizing it to generate future plans, it becomes possible to provide a more individually optimized experience. This system allows users to enjoy travel tailored to their personal preferences without having to research details in advance.

[0006] "Image acquisition means" refers to a component that has the function of receiving and recording images of the user.

[0007] An "estimation means" is a component that has the function of analyzing and inferring the user's preferences and tastes from acquired images.

[0008] A "generating means" is a component that has the function of automatically assembling a travel plan based on estimated preferences.

[0009] "Presentation means" refers to a component that has the function of providing the generated travel plan to the user in a visual or other way.

[0010] A "learning tool" is a component that has the function of collecting feedback from users and updating data and models to reflect that feedback in future travel plans. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5]This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of the arithmetic unit include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of the non-volatile storage device include a flash memory (SSD (Solid State Drive)), a magnetic disk (e.g., a hard disk), or a magnetic tape, and the like.

[0017] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor, an antenna, and the like. The communication I / F controls communication between multiple computers. Examples of the communication standard applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] This invention is a system that analyzes a user's image information to estimate their preferences and create an appropriate travel plan. Specific embodiments of this system are described below.

[0033] First, the user uses their smartphone camera to take a picture of their everyday clothes. This image is sent to a server through the application on the device. At this point, the image acquisition device receives the image data.

[0034] The server applies image processing algorithms to analyze the acquired image and extract clothing features. The estimation tool uses this feature data to analyze the user's preferences using a generative AI, inferring the user's hobbies and interests. For example, bright colors and sporty clothing might suggest an interest in the outdoors or sports.

[0035] Next, the server generates a travel plan based on the above prediction results via a generation mechanism. This plan includes a starting point based on the user's location information, as well as suggestions for sightseeing destinations and activities that suit their preferences. Furthermore, the plan is optimized by taking into account the user's pre-registered individual constraints (such as allergies or places they dislike).

[0036] The travel plan generated from the server is sent to the device and presented to the user. The user can review this plan on their smartphone and make adjustments as needed.

[0037] After their trip, users input feedback about the tourist attractions and activities they experienced into the application. The device sends this feedback to the server, where its learning mechanisms analyze it. This allows the server to improve future travel plans, making them more accurate and appealing to the user.

[0038] As a concrete example, suppose a user uploads a photo of themselves wearing a casual jacket, jeans, and sneakers. The generating AI estimates from this outfit that the user prefers a relaxed atmosphere and casual outdoor activities. Based on this, the server suggests a plan that includes a picnic in a park or a nature walk, and the user accepts and carries it out.

[0039] This system allows users to efficiently obtain fresh travel experiences tailored to their individual needs.

[0040] The following describes the processing flow.

[0041] Step 1:

[0042] The user takes a picture of their clothing using their smartphone camera. This image data is saved within the application.

[0043] Step 2:

[0044] The device uploads the saved images to the server via the application. A secure protocol is used for image transmission to ensure safety.

[0045] Step 3:

[0046] The server analyzes the received image data. Using image processing algorithms, it extracts features such as clothing color, design, and accessories, and organizes them as digital data.

[0047] Step 4:

[0048] The server's estimation method uses generative AI based on feature data to infer the user's hobbies and preferences. This is a process of reading cultural background and lifestyle preferences from clothing style and color palettes.

[0049] Step 5:

[0050] The server generates a travel plan based on inferred preference information. In this process, it selects tourist destinations and activities according to location information and applies a route optimization algorithm to determine an efficient travel order.

[0051] Step 6:

[0052] The device receives the generated travel plan from the server and presents it to the user. The user can view the plan details within the application and, if necessary, modify or customize parts of the plan.

[0053] Step 7:

[0054] Users take a trip and then provide feedback via the application, including evaluations and impressions of the places they visited and the activities they participated in. This information is sent to the server.

[0055] Step 8:

[0056] The server analyzes the feedback it receives and uses a learning algorithm to update the database, which will then be used as a reference for future travel plans. This makes it possible to provide more personalized travel plans.

[0057] (Example 1)

[0058] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0059] When travelers plan trips that suit their preferences and interests, gathering a large amount of information and making appropriate choices based on that information is a challenging task. Furthermore, it can be difficult to incorporate past experiences and feedback into future plans. Therefore, there is a need to create effective and efficient travel plans tailored to individual preferences.

[0060] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0061] In this invention, the server includes an information acquisition means for acquiring the user's visual information, an analysis means for analyzing the acquired visual information and estimating the user's preferences, and a generation means for constructing and presenting a travel plan based on the analysis results. This makes it possible to generate a travel plan tailored to the individual preferences of the user and to reflect subsequent feedback in the next plan.

[0062] "Visual information" refers to image data related to the user's clothing and appearance, and is information acquired through the camera function.

[0063] "Information acquisition means" refers to the components of devices or software that have the function of collecting visual information.

[0064] "Analysis means" refers to technical means used to process acquired visual information and infer the user's preferences and interests.

[0065] "Generating means" refers to the function of a device or software that constructs a travel plan based on the analyzed results and presents it to the user.

[0066] "Learning methods" refer to technical systems that accumulate user feedback and use it to improve the accuracy of future travel plans.

[0067] "Travel planning" refers to suggestions for itineraries and destinations that are created based on the user's preferences and location information.

[0068] This invention relates to a system that automatically generates and presents travel plans tailored to the user's preferences. Specific embodiments of the invention are described below.

[0069] First, the user uses a device with a dedicated application installed to take pictures of their clothes and other items using the camera function. This image data is then sent to the server via the application on the device.

[0070] The server uses image processing libraries such as OpenCV to analyze the received image data. Through image analysis, it detects features such as clothing color, material, and style, and uses this information to estimate the user's preferences. This analysis result is then input as a prompt into the server's AI model, which then concretizes the user's preferences. A concrete example of a prompt might be, "Infer the user's preferences from their fashion style and create a travel plan based on that."

[0071] Next, the server automatically generates a travel plan based on the estimated user preferences. This plan includes the optimal route from the starting point to the destination, suggested sightseeing spots, and activities. The suggestions are optimized based on the user's history, including their registered location and rating information.

[0072] Finally, the generated travel plan is sent to the device and presented to the user. After the trip, the user provides feedback to the application with evaluation information about the places visited and activities participated in. The evaluation information sent from the device to the server is analyzed by the server's learning mechanism and reflected in the next travel plan.

[0073] This invention enables users to efficiently obtain personalized travel plans tailored to their individual preferences, thereby enriching their travel experience.

[0074] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0075] Step 1:

[0076] The user takes a photo of their clothing using the device's camera. This photo is saved on the device as a JPEG or PNG file and sent to the server via a dedicated application. The transmitted image data becomes the server's input.

[0077] Step 2:

[0078] The server analyzes the received image data using an image processing library such as OpenCV. This image analysis process detects and extracts features such as clothing color, pattern, and style, obtaining this feature data as output. Specific operations here include data processing such as edge detection and color analysis.

[0079] Step 3:

[0080] The server inputs the extracted feature data as prompts into the generating AI model. Based on these prompts, the AI model makes estimations about the user's preferences. In this estimation process, the AI uses a pre-trained model to output the user's preferences (for example, styles such as sporty or casual).

[0081] Step 4:

[0082] The server generates a travel plan based on estimated preference information. This process takes into account the user's location and selects relevant tourist spots and activities. The server processes this information and outputs a travel plan that includes suggestions for the optimal route and activities for the user.

[0083] Step 5:

[0084] The terminal receives the travel plan sent from the server and presents it to the user. The user can then review the travel details based on this plan and make manual adjustments as needed. The adjusted plan is then output to the terminal as the final plan.

[0085] Step 6:

[0086] After the trip ends, the user enters their evaluation information about the trip on their device and sends feedback from the device to the server. This feedback becomes input to the server.

[0087] Step 7:

[0088] The server analyzes the received feedback and implements a learning process for future travel suggestions. Based on this data, the server updates its model to improve the accuracy of future plan generation. The analysis results are then output to the server's database.

[0089] (Application Example 1)

[0090] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0091] In today's urban environment, it is difficult for tourists and residents alike to effectively discover and experience tourist destinations and activities that suit their individual preferences. Therefore, there is a need for a system that allows users to easily and quickly obtain travel plans that are best suited to their interests and circumstances.

[0092] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0093] In this invention, the server includes means for acquiring video information of the user, means for estimating the user's preferences based on the acquired video information, and means for generating a travel plan based on the estimated preferences. This makes it possible to efficiently propose the optimal sightseeing route tailored to the individual preferences of the user.

[0094] "Users" refer to individuals who use the system to receive personalized travel plans based on their individual preferences.

[0095] "Visual information" refers to images and visual data acquired by the system, such as the user's clothing and location information.

[0096] "Preferences" refer to characteristics related to the user's interests and preferences, and are elements that the system considers when constructing a travel plan.

[0097] A "travel plan" refers to a plan that includes suggestions for tourist destinations and activities generated based on the user's preferences and location information.

[0098] "Means" refers to the components or methods used to achieve a specific function or process within a system.

[0099] An "optimal sightseeing route" refers to a path that allows users to visit tourist destinations in an efficient and engaging way, based on their preferences and current location.

[0100] This invention can be implemented by a user using a dedicated application installed on a device such as a smartphone. First, the user uses the device's camera to photograph their everyday clothing and surrounding environment and sends the video information to a server. The server receives this video information and uses image analysis software to extract features of the clothing and background. Specifically, it uses image processing technologies such as Azure® Computer Vision API.

[0101] The extracted feature information is used as base data to estimate user preferences using a generative AI model. For example, if a user is wearing a casual jacket, the system might infer that they prefer relaxed outdoor activities. Based on this preference data, the server generates an efficient and engaging travel plan that reflects the user's location. In generating this plan, for example, the Google® Maps API can be used to suggest the optimal sightseeing route.

[0102] The generated travel plan is presented to the user via their device, allowing them to plan their trip based on the presented plan. Furthermore, feedback information entered by the user after the trip is sent to the server, analyzed again by AI as unclassified data, and used as learning data to improve the accuracy of future planning. This enables users to obtain more personalized and appealing travel experiences.

[0103] As a concrete example, when a user uploads an image of sneakers and jeans, the server might infer that the user prefers casual and active activities and suggest plans for a picnic in a park or a nature walk.

[0104] An example of a prompt message might be, "This user is wearing a casual jacket. This attire conveys a relaxed atmosphere. What travel activity would be most suitable?" This is the kind of message that would be input to the generative AI model.

[0105] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0106] Step 1:

[0107] The user uses their smartphone to photograph their clothing and surrounding environment. The input is video information captured by the smartphone's camera. This video information is temporarily stored on the device and later sent to the server.

[0108] Step 2:

[0109] The terminal transmits the acquired video information to the server. The server receives this video information and prepares it for analysis. The input is the image data transmitted from the terminal, and the output is the data ready for analysis.

[0110] Step 3:

[0111] The server begins image analysis using the received image data. Specifically, it uses the Azure Computer Vision API to extract features of the user's clothing and background. This analysis process analyzes the color, shape, and decorations of the clothing in the image and outputs them as feature information.

[0112] Step 4:

[0113] The server uses the extracted feature information to estimate user preferences using a generative AI model. The input is the feature information obtained in the previous step, and the output is the estimated preference data. Based on the estimation results, data about activities that the user is likely to be interested in is generated.

[0114] Step 5:

[0115] The server combines estimated preferences with the user's location information and uses the Google Maps API to generate an optimal travel plan. The input is estimated preference data and location information, and the output is optimal sightseeing route information as a travel plan.

[0116] Step 6:

[0117] The server sends the generated travel plan to the terminal, which then presents it to the user. The input is the generated travel plan data, and the output is the sightseeing route information displayed on the user's screen. The user can then plan their trip based on this information.

[0118] Step 7:

[0119] After completing their trip, users input feedback using the application. The device aggregates this feedback data and sends it to the server. The input is user-provided feedback data, and the output is data prepared for learning purposes.

[0120] Step 8:

[0121] The server analyzes the received feedback data and performs machine learning to improve the accuracy of the next travel plan. The input is the feedback data, and the output is an improved preference estimation model. By repeatedly training the generative AI model in this process, the accuracy of the suggestions improves.

[0122] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0123] This invention provides a system that recognizes the user's emotional state and offers a more accurate, personalized travel plan by combining a conventional function of estimating preferences using the user's image information and creating a travel plan with an emotion engine. A specific embodiment of this system is described below.

[0124] First, the user uses their smartphone camera to take a picture of their clothing and saves the image to their device. This image is then sent to the server through the application on the device.

[0125] The server analyzes images, with the image acquisition mechanism working to extract clothing features and providing this data to the estimation mechanism. The estimation mechanism uses generative AI to infer the user's preferences from the clothing. Simultaneously, the emotion engine analyzes the emotional state from the images and additional voice input to determine the user's emotions. This information integrates the user's immediate emotions and long-term preferences and is used to design travel plans.

[0126] For example, if the user is relaxed yet curious, the emotion engine can suggest a travel plan that includes a calming natural environment while also offering new experiences. Based on this information, the server creates a travel plan via a generation mechanism and sends it to the device.

[0127] Users can view the travel plan presented on their device and check the details within the application. The user's reaction to the plan is again monitored by the sentiment engine, and the plan can be adjusted in real time.

[0128] After a trip, users provide feedback on their destinations and activities. This feedback is sent from the device to a server and analyzed by learning tools. The feedback includes the user's emotional responses, which are used to further improve the accuracy of future plans.

[0129] For example, if the emotion engine determines that the user is excited when a trip is presented, it may indicate that the proposed plan includes high-energy activities or events. In this way, the system of the present invention can dynamically adapt and provide a travel plan based on the user's current emotional state and individual preferences.

[0130] The following describes the processing flow.

[0131] Step 1:

[0132] The user takes a picture of their clothing using their smartphone camera and saves the image data to their device. This image reflects the user's usual style.

[0133] Step 2:

[0134] The device uploads images taken via the application to the server. In addition, if the user wishes, emotional data can also be sent via voice input.

[0135] Step 3:

[0136] The server analyzes the image data received using the image acquisition method to extract the characteristics of the clothing's color, pattern, and style. This allows for an initial assessment of the user's preferences.

[0137] Step 4:

[0138] The server's estimation method uses a generative AI to predict user preferences based on feature data extracted from images. During this process, past data is also referenced to check for consistency in preferences.

[0139] Step 5:

[0140] The server's emotion engine analyzes the received additional emotion information (image and audio facial expression analysis) to understand the user's emotional state. This information is then integrated with the preference estimation results.

[0141] Step 6:

[0142] The server generates a travel plan tailored to the user based on predictions and emotional states. The generated plan includes suggested destinations, routes, and recommended activities.

[0143] Step 7:

[0144] The device receives the generated travel plan from the server and displays it to the user. The user can review the plan on the app and provide feedback on changes or suggestions as needed.

[0145] Step 8:

[0146] During the trip, the user's device continuously monitors their emotional state and adjusts the travel plan in real time as needed.

[0147] Step 9:

[0148] After their trip, users provide feedback on tourist destinations and activities via their devices and send it to the server. This feedback includes emotional responses, which are then used by the server's learning mechanisms to improve future travel plan suggestions.

[0149] (Example 2)

[0150] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0151] Traditional travel planning systems lacked the functionality to provide personalized plans that took into account the individual preferences and emotional states of users. This made it difficult to deliver a satisfying travel experience. Furthermore, there was a need for a system that could analyze users' emotional states in real time and flexibly adjust the plan accordingly.

[0152] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0153] In this invention, the server includes a device for acquiring image information of the user, a device for estimating the user's preferences based on the acquired image information, and a device for analyzing the user's emotional state from information related to the user. This makes it possible to generate and provide travel plans tailored to the individual preferences and emotional state of the user.

[0154] A "device for acquiring user image information" is a device that uses the device's camera or sensors to collect image data, including the user's posture and clothing.

[0155] A "preference estimation device" is a device equipped with an algorithm that calculates the user's preferred style and activity tendencies based on collected data.

[0156] A "device for analyzing emotional states" is a device equipped with the function of analyzing the user's visual and auditory information to determine their mental state and mood.

[0157] A "travel plan generation device" is a device that designs and proposes the optimal travel itinerary and activities based on estimated preferences and analyzed emotional states.

[0158] A "presenting device" is a device that has an interface for conveying generated plans or information to the user visually or audibly.

[0159] A "device that learns information" is a device that analyzes feedback received from users and the data it generates, and accumulates knowledge to improve services in the future.

[0160] The system of this invention primarily provides personalized travel plans, taking into account the user's preferences and emotional state. First, the user uses a device such as a smartphone or tablet to acquire image information of themselves using the camera. The acquired images are stored on the device and sent to the server through the application. The user's image information includes elements such as clothing and facial expressions.

[0161] The server uses image processing software to analyze the received image information and extract features such as clothing and facial expressions. This process also includes using generative AI models to estimate user preferences from the collected data.

[0162] Furthermore, the server incorporates an emotion engine, which has the ability to analyze the user's emotional state from audio data provided along with image information. This function allows it to capture emotions such as relaxation or excitement.

[0163] Based on aggregated preference and emotional information, the server generates a travel plan. In addition to general travel suggestions, it lists activities and places suited to the user's preferences, creating a more personalized plan. The generated travel plan is sent to the terminal and presented to the user.

[0164] Users can review the presented travel plan and adjust details within the application as needed. After completing the trip, users input feedback on the places visited and activities, including emotional responses, and send it from their device to the server. The server uses this feedback to learn and further improve the accuracy of future plans.

[0165] For example, if a user is dressed in "resort style," the AI model can infer that the user prefers beach resorts. An example of a prompt message would be, "Based on your current attire and voice input, it seems you want to relax and enjoy new experiences. Please suggest a travel plan that suits this situation." By giving instructions to the server in this format, the system can generate an appropriate travel plan.

[0166] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0167] Step 1:

[0168] The user uses the device's camera to take an image that includes their clothing and facial expression. This image data is saved on the device. The input is the image captured by the smartphone's camera, and the output is the saved image file. Specifically, the user follows the app's instructions to launch the camera and capture an image.

[0169] Step 2:

[0170] The device sends saved images to the server via the application. The input here is the image file stored on the device, and the output is the image data transferred to the server. Specifically, the application uploads the image data to the server using an internet connection.

[0171] Step 3:

[0172] The server acquires the received image data and uses image analysis software to extract features of clothing and facial expressions. The input here is the image data sent to the server, and the output is the extracted feature data. Specifically, the analysis program on the server executes image processing algorithms to identify colors, patterns, and facial expressions.

[0173] Step 4:

[0174] The server's estimation device uses a generative AI model based on extracted feature data to estimate user preferences. The input is extracted feature data, and the output is estimated data regarding user preferences. Specifically, the AI model utilizes machine learning algorithms to compare with past databases and identify the user's style and preferences.

[0175] Step 5:

[0176] The server's emotion engine analyzes image information and additional audio data to determine the user's emotional state. The input here is image and audio data, and the output is the analysis results regarding the emotional state. Specifically, the emotion analysis software quantifies voice tone and facial expressions, classifying them into states such as calmness or excitement.

[0177] Step 6:

[0178] The server integrates the results of preference and emotion analysis and generates personalized travel plans using a generation mechanism. The input is estimated preference and emotion data, and the output is the generated travel plan. Specifically, the server interacts with the AI using prompt messages to select activities and destinations suitable for each user.

[0179] Step 7:

[0180] The server sends the generated travel plan to the terminal. The terminal then displays the received plan to the user. The input is the travel plan data from the server, and the output is the travel plan displayed on the terminal. Specifically, the application receives the data and visualizes the plan on the user interface.

[0181] Step 8:

[0182] After their trip, the user enters feedback through the application, and the device sends this to the server. The input is the user's feedback data, and the output is the feedback transferred to the server. Specifically, the user enters their visited locations and impressions into an input form and presses the submit button.

[0183] Step 9:

[0184] The server analyzes the received feedback using a learning device and stores the learning results in a database to improve the accuracy of future travel plans. The input is the feedback data, and the output is the updated learning database. Specifically, a machine learning algorithm analyzes the feedback and integrates the most relevant data into the model.

[0185] (Application Example 2)

[0186] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0187] Providing personalized viewing content efficiently to diverse users is crucial for general content distribution services. However, conventional systems often rely solely on user preferences for recommendations, failing to consider the user's immediate emotional state. This can lead to decreased user satisfaction and increased churn rates. This invention aims to achieve more suitable content delivery by considering both user preferences and emotions in combination.

[0188] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0189] In this invention, the server includes an image acquisition means for acquiring image information of the user, an estimation means for estimating the user's preferences based on the acquired image information, and an emotion analysis means for analyzing the user's emotional information. This makes it possible to consider the user's preferences and emotions simultaneously and suggest the most suitable viewing content.

[0190] "Image acquisition means" refers to a mechanism for acquiring image information of the user, and involves acquiring the user's clothing and facial expressions using a camera or other photographic device.

[0191] An "estimation tool" is a device that analyzes acquired image information and uses that information to infer the user's preferences.

[0192] "Emotional analysis means" refers to a function for analyzing a user's emotional information, recognizing the user's emotions from image and audio information.

[0193] A "generation method" is a mechanism for suggesting viewing content based on estimated preferences and analyzed emotional information.

[0194] "Presentation means" refers to a function for presenting generated viewing content to the user.

[0195] A "learning tool" is a system that has an algorithm to improve the accuracy of its suggestions based on feedback information received from users.

[0196] A "generative AI model" is an algorithm that takes a prompt as input and generates the necessary information and content, and is a technology used to suggest recommended content.

[0197] A "prompt statement" is a sentence or command given to a generative AI model, used to specify the content to be generated and the direction of the response.

[0198] To implement this invention, a system is needed that collects the user's images and voice and, based on that, suggests the most suitable viewing content for the user. This system mainly consists of a server and terminals.

[0199] First, the device acquires the user's image and voice through its camera and microphone. Image acquisition uses image processing libraries such as OpenCV to extract features of clothing and facial expressions. Voice data is analyzed for emotional state using a deep learning-based emotion recognition model (such as Tensorflow® or PyTorch).

[0200] The collected information is sent to a server. Based on this data, the server uses a generative AI model to suggest viewing content based on the user's preferences and immediate emotions. For example, the GPT model is used as the generative AI model, and a list of appropriate content is generated by inputting prompts.

[0201] The generated content is presented to the user on their device. The user reviews the presented content and sends their feedback back to the server via their device. The server analyzes this feedback using learning mechanisms to improve the accuracy of its suggestions.

[0202] As a concrete example, when a user is at home in relaxed clothing on the weekend, the system sends a prompt message to its AI model saying, "Please recommend some relaxing comedy movies." Based on this prompt, a list of comedy movies and dramas that suit the user's state is generated and presented on the device.

[0203] This makes it possible to deliver content that matches users' preferences and emotions, and to provide highly satisfying services to individual users.

[0204] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0205] Step 1:

[0206] The device acquires the user's image and voice. The device captures the user's voice and appearance through the camera and microphone. This input data includes image and audio files.

[0207] Step 2:

[0208] The device uses OpenCV to analyze the acquired images. The image acquisition method extracts features such as color, hue, and clothing style from the image and sends this data to the server. The output is data indicating the user's clothing and appearance.

[0209] Step 3:

[0210] The device passes the audio data to a deep learning model for analysis. The emotion analysis tool analyzes the tone and pitch of the voice to estimate the user's emotional state. The output of this step is an index representing the user's emotions.

[0211] Step 4:

[0212] The server uses the analyzed image data and sentiment data for estimation. This data is input into a generative AI model to generate prompt messages. These prompt messages request viewing content that matches the user's preferences and emotions.

[0213] Step 5:

[0214] The server uses the generated prompt text to run a generative AI model such as GPT. The model generates a list of appropriate viewing content based on the input prompt text. The output of this step is a list of viewing content.

[0215] Step 6:

[0216] The server sends the generated content to the device. The device then presents this list to the user, who selects the content they wish to view.

[0217] Step 7:

[0218] The user selects and watches content from a list of presented options. During this process, the device records the user's reactions as feedback and sends this data to the server.

[0219] Step 8:

[0220] The server uses learning tools to further analyze the received feedback information. This process improves the system's recommendation accuracy. The output of this step is the improved recommendation algorithm.

[0221] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0222] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0223] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0224] [Second Embodiment]

[0225] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0226] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0227] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0228] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0229] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0230] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0231] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0232] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0233] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0234] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0235] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0236] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0237] This invention is a system that analyzes a user's image information to estimate their preferences and create an appropriate travel plan. Specific embodiments of this system are described below.

[0238] First, the user uses their smartphone camera to take a picture of their everyday clothes. This image is sent to a server through the application on the device. At this point, the image acquisition device receives the image data.

[0239] The server applies image processing algorithms to analyze the acquired image and extract clothing features. The estimation tool uses this feature data to analyze the user's preferences using a generative AI, inferring the user's hobbies and interests. For example, bright colors and sporty clothing might suggest an interest in the outdoors or sports.

[0240] Next, the server generates a travel plan based on the above prediction results via a generation mechanism. This plan includes a starting point based on the user's location information, as well as suggestions for sightseeing destinations and activities that suit their preferences. Furthermore, the plan is optimized by taking into account the user's pre-registered individual constraints (such as allergies or places they dislike).

[0241] The travel plan generated from the server is sent to the device and presented to the user. The user can review this plan on their smartphone and make adjustments as needed.

[0242] After their trip, users input feedback about the tourist attractions and activities they experienced into the application. The device sends this feedback to the server, where its learning mechanisms analyze it. This allows the server to improve future travel plans, making them more accurate and appealing to the user.

[0243] As a concrete example, suppose a user uploads a photo of themselves wearing a casual jacket, jeans, and sneakers. The generating AI estimates from this outfit that the user prefers a relaxed atmosphere and casual outdoor activities. Based on this, the server suggests a plan that includes a picnic in a park or a nature walk, and the user accepts and carries it out.

[0244] This system allows users to efficiently obtain fresh travel experiences tailored to their individual needs.

[0245] The following describes the processing flow.

[0246] Step 1:

[0247] The user takes a picture of their clothing using their smartphone camera. This image data is saved within the application.

[0248] Step 2:

[0249] The device uploads the saved images to the server via the application. A secure protocol is used for image transmission to ensure safety.

[0250] Step 3:

[0251] The server analyzes the received image data. Using image processing algorithms, it extracts features such as clothing color, design, and accessories, and organizes them as digital data.

[0252] Step 4:

[0253] The server's estimation method uses generative AI based on feature data to infer the user's hobbies and preferences. This is a process of reading cultural background and lifestyle preferences from clothing style and color palettes.

[0254] Step 5:

[0255] The server generates a travel plan based on inferred preference information. In this process, it selects tourist destinations and activities according to location information and applies a route optimization algorithm to determine an efficient travel order.

[0256] Step 6:

[0257] The device receives the generated travel plan from the server and presents it to the user. The user can view the plan details within the application and, if necessary, modify or customize parts of the plan.

[0258] Step 7:

[0259] Users take a trip and then provide feedback via the application, including evaluations and impressions of the places they visited and the activities they participated in. This information is sent to the server.

[0260] Step 8:

[0261] The server analyzes the feedback it receives and uses a learning algorithm to update the database, which will then be used as a reference for future travel plans. This makes it possible to provide more personalized travel plans.

[0262] (Example 1)

[0263] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0264] When travelers plan trips that suit their preferences and interests, gathering a large amount of information and making appropriate choices based on that information is a challenging task. Furthermore, it can be difficult to incorporate past experiences and feedback into future plans. Therefore, there is a need to create effective and efficient travel plans tailored to individual preferences.

[0265] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0266] In this invention, the server includes an information acquisition means for acquiring the user's visual information, an analysis means for analyzing the acquired visual information and estimating the user's preferences, and a generation means for constructing and presenting a travel plan based on the analysis results. This makes it possible to generate a travel plan tailored to the individual preferences of the user and to reflect subsequent feedback in the next plan.

[0267] "Visual information" refers to image data related to the user's clothing and appearance, and is information acquired through the camera function.

[0268] "Information acquisition means" refers to the components of devices or software that have the function of collecting visual information.

[0269] "Analysis means" refers to technical means used to process acquired visual information and infer the user's preferences and interests.

[0270] "Generating means" refers to the function of a device or software that constructs a travel plan based on the analyzed results and presents it to the user.

[0271] "Learning methods" refer to technical systems that accumulate user feedback and use it to improve the accuracy of future travel plans.

[0272] "Travel planning" refers to suggestions for itineraries and destinations that are created based on the user's preferences and location information.

[0273] This invention relates to a system that automatically generates and presents travel plans tailored to the user's preferences. Specific embodiments of the invention are described below.

[0274] First, the user uses a device with a dedicated application installed to take pictures of their clothes and other items using the camera function. This image data is then sent to the server via the application on the device.

[0275] The server uses image processing libraries such as OpenCV to analyze the received image data. Through image analysis, it detects features such as clothing color, material, and style, and uses this information to estimate the user's preferences. This analysis result is then input as a prompt into the server's AI model, which then concretizes the user's preferences. A concrete example of a prompt might be, "Infer the user's preferences from their fashion style and create a travel plan based on that."

[0276] Next, based on the estimated user preference information, the server automatically generates a travel plan. This plan includes the optimal route from the departure point to the destination, proposed tourist spots, and activities. The proposed content is optimized based on the history including the registered user's location information and evaluation information.

[0277] Finally, the generated travel plan is sent to the terminal and presented to the user. After the trip, the user provides feedback on the evaluation information about the visited spots and participated activities to the application. The evaluation information sent from the terminal to the server is analyzed by the server's learning means and reflected in the next travel plan.

[0278] With this invention, users can efficiently obtain a unique travel plan according to their individual preferences and enrich their travel experiences.

[0279] The flow of the specific process in Example 1 will be described using FIG. 11.

[0280] Step 1:

[0281] The user uses the camera of the terminal to take a photo of their own clothing. This photo is saved in the terminal in JPEG or PNG file format and sent to the server via a dedicated application. The transmitted image data serves as the input to the server.

[0282] Step 2:

[0283] The server analyzes the received image data using an image processing library such as OpenCV. In this image analysis process, features such as the color, pattern, and style of the clothing are detected and extracted, and these feature data are obtained as the output. The specific operations here include data processing such as edge detection and color analysis.

[0284] Step 3:

[0285] The server inputs the extracted feature data as a prompt into the generative AI model. Based on this prompt, the AI model estimates the user's preferences. In this estimation process, the AI uses a pre-trained model to output the user's preferences (such as styles like sporty or casual).

[0286] Step 4:

[0287] The server generates a travel plan based on the estimated preference information. In generating the travel plan, while considering the user's location information, relevant tourist spots and activities are selected. The server processes this information and outputs a travel plan including suggestions for the optimal route and activities for the user.

[0288] Step 5:

[0289] The terminal receives the travel plan sent from the server and presents it to the user. The user can check the details of the trip based on this plan and make manual adjustments if necessary. The adjustment results are output to the terminal as the final plan.

[0290] Step 6:

[0291] After the trip, the user inputs evaluation information about the trip on the terminal and sends feedback from the terminal to the server. This feedback serves as input to the server.

[0292] Step 7:

[0293] The server analyzes the received feedback and conducts a learning process for the next travel proposal. The server updates the model based on this data to improve the accuracy of the next plan generation. The analysis results are output as an update to the database in the server.

[0294] (Application Example 1)

[0295] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0296] In today's urban environment, it is difficult for tourists and residents alike to effectively discover and experience tourist destinations and activities that suit their individual preferences. Therefore, there is a need for a system that allows users to easily and quickly obtain travel plans that are best suited to their interests and circumstances.

[0297] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0298] In this invention, the server includes means for acquiring video information of the user, means for estimating the user's preferences based on the acquired video information, and means for generating a travel plan based on the estimated preferences. This makes it possible to efficiently propose the optimal sightseeing route tailored to the individual preferences of the user.

[0299] "Users" refer to individuals who use the system to receive personalized travel plans based on their individual preferences.

[0300] "Visual information" refers to images and visual data acquired by the system, such as the user's clothing and location information.

[0301] "Preferences" refer to characteristics related to the user's interests and preferences, and are elements that the system considers when constructing a travel plan.

[0302] A "travel plan" refers to a plan that includes suggestions for tourist destinations and activities generated based on the user's preferences and location information.

[0303] "Means" refers to the components or methods used to achieve a specific function or process within a system.

[0304] The "optimal tourist route" refers to a route for touring tourist attractions in an efficient and attractive manner based on the user's preferences and current location.

[0305] This invention can be implemented by the user using a dedicated application installed on a terminal such as a smartphone. First, the user uses the camera of the terminal to photograph their daily clothing and the surrounding environment and transmits the video information to the server. The server receives this video information and extracts the characteristics of the clothing and background using image analysis software. Specifically, image processing technologies such as the Azure Computer Vision API are used.

[0306] The extracted feature information is used as basic data for estimating the user's preferences by leveraging a generative AI model. For example, if the user is wearing a casual jacket, the system may infer that there is a possibility that the user prefers relaxed outdoor activities. Based on this preference data, the server generates an efficient and interesting travel plan while reflecting the user's location information. In generating this plan, for example, the Google Maps API can be used to propose an optimal tourist route.

[0307] The generated travel plan is presented to the user through the terminal, and the user can plan their trip based on the presented plan. Furthermore, the feedback information input by the user after the trip is transmitted to the server, analyzed again by AI as undetermined data, and used as learning data to improve the accuracy of the next planning. This enables the user to obtain a more attractive travel experience that suits them.

[0308] As a specific example, when the user uploads images of sneakers and jeans, the server estimates that the user prefers casual and active activities and proposes a picnic in the park or a nature walk plan.

[0309] An example of a prompt message might be, "This user is wearing a casual jacket. This attire conveys a relaxed atmosphere. What travel activity would be most suitable?" This is the kind of message that would be input to the generative AI model.

[0310] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0311] Step 1:

[0312] The user uses their smartphone to photograph their clothing and surrounding environment. The input is video information captured by the smartphone's camera. This video information is temporarily stored on the device and later sent to the server.

[0313] Step 2:

[0314] The terminal transmits the acquired video information to the server. The server receives this video information and prepares it for analysis. The input is the image data transmitted from the terminal, and the output is the data ready for analysis.

[0315] Step 3:

[0316] The server begins image analysis using the received image data. Specifically, it uses the Azure Computer Vision API to extract features of the user's clothing and background. This analysis process analyzes the color, shape, and decorations of the clothing in the image and outputs them as feature information.

[0317] Step 4:

[0318] The server uses the extracted feature information to estimate user preferences using a generative AI model. The input is the feature information obtained in the previous step, and the output is the estimated preference data. Based on the estimation results, data about activities that the user is likely to be interested in is generated.

[0319] Step 5:

[0320] The server combines estimated preferences with the user's location information and uses the Google Maps API to generate an optimal travel plan. The input is estimated preference data and location information, and the output is optimal sightseeing route information as a travel plan.

[0321] Step 6:

[0322] The server sends the generated travel plan to the terminal, which then presents it to the user. The input is the generated travel plan data, and the output is the sightseeing route information displayed on the user's screen. The user can then plan their trip based on this information.

[0323] Step 7:

[0324] After completing their trip, users input feedback using the application. The device aggregates this feedback data and sends it to the server. The input is user-provided feedback data, and the output is data prepared for learning purposes.

[0325] Step 8:

[0326] The server analyzes the received feedback data and performs machine learning to improve the accuracy of the next travel plan. The input is the feedback data, and the output is an improved preference estimation model. By repeatedly training the generative AI model in this process, the accuracy of the suggestions improves.

[0327] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0328] This invention provides a system that recognizes the user's emotional state and offers a more accurate, personalized travel plan by combining a conventional function of estimating preferences using the user's image information and creating a travel plan with an emotion engine. A specific embodiment of this system is described below.

[0329] First, the user uses their smartphone camera to take a picture of their clothing and saves the image to their device. This image is then sent to the server through the application on the device.

[0330] The server analyzes images, with the image acquisition mechanism working to extract clothing features and providing this data to the estimation mechanism. The estimation mechanism uses generative AI to infer the user's preferences from the clothing. Simultaneously, the emotion engine analyzes the emotional state from the images and additional voice input to determine the user's emotions. This information integrates the user's immediate emotions and long-term preferences and is used to design travel plans.

[0331] For example, if the user is relaxed yet curious, the emotion engine can suggest a travel plan that includes a calming natural environment while also offering new experiences. Based on this information, the server creates a travel plan via a generation mechanism and sends it to the device.

[0332] Users can view the travel plan presented on their device and check the details within the application. The user's reaction to the plan is again monitored by the sentiment engine, and the plan can be adjusted in real time.

[0333] After a trip, users provide feedback on their destinations and activities. This feedback is sent from the device to a server and analyzed by learning tools. The feedback includes the user's emotional responses, which are used to further improve the accuracy of future plans.

[0334] For example, if the emotion engine determines that the user is excited when a trip is presented, it may indicate that the proposed plan includes high-energy activities or events. In this way, the system of the present invention can dynamically adapt and provide a travel plan based on the user's current emotional state and individual preferences.

[0335] The following describes the processing flow.

[0336] Step 1:

[0337] The user takes a picture of their clothing using their smartphone camera and saves the image data to their device. This image reflects the user's usual style.

[0338] Step 2:

[0339] The device uploads images taken via the application to the server. In addition, if the user wishes, emotional data can also be sent via voice input.

[0340] Step 3:

[0341] The server analyzes the image data received using the image acquisition method to extract the characteristics of the clothing's color, pattern, and style. This allows for an initial assessment of the user's preferences.

[0342] Step 4:

[0343] The server's estimation method uses a generative AI to predict user preferences based on feature data extracted from images. During this process, past data is also referenced to check for consistency in preferences.

[0344] Step 5:

[0345] The server's emotion engine analyzes the received additional emotion information (image and audio facial expression analysis) to understand the user's emotional state. This information is then integrated with the preference estimation results.

[0346] Step 6:

[0347] The server generates a travel plan tailored to the user based on predictions and emotional states. The generated plan includes suggested destinations, routes, and recommended activities.

[0348] Step 7:

[0349] The device receives the generated travel plan from the server and displays it to the user. The user can review the plan on the app and provide feedback on changes or suggestions as needed.

[0350] Step 8:

[0351] During the trip, the user's device continuously monitors their emotional state and adjusts the travel plan in real time as needed.

[0352] Step 9:

[0353] After their trip, users provide feedback on tourist destinations and activities via their devices and send it to the server. This feedback includes emotional responses, which are then used by the server's learning mechanisms to improve future travel plan suggestions.

[0354] (Example 2)

[0355] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0356] Traditional travel planning systems lacked the functionality to provide personalized plans that took into account the individual preferences and emotional states of users. This made it difficult to deliver a satisfying travel experience. Furthermore, there was a need for a system that could analyze users' emotional states in real time and flexibly adjust the plan accordingly.

[0357] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0358] In this invention, the server includes a device for acquiring image information of the user, a device for estimating the user's preferences based on the acquired image information, and a device for analyzing the user's emotional state from information related to the user. This makes it possible to generate and provide travel plans tailored to the individual preferences and emotional state of the user.

[0359] A "device for acquiring user image information" is a device that uses the device's camera or sensors to collect image data, including the user's posture and clothing.

[0360] A "preference estimation device" is a device equipped with an algorithm that calculates the user's preferred style and activity tendencies based on collected data.

[0361] A "device for analyzing emotional states" is a device equipped with the function of analyzing the user's visual and auditory information to determine their mental state and mood.

[0362] A "travel plan generation device" is a device that designs and proposes the optimal travel itinerary and activities based on estimated preferences and analyzed emotional states.

[0363] A "presenting device" is a device that has an interface for conveying generated plans or information to the user visually or audibly.

[0364] A "device that learns information" is a device that analyzes feedback received from users and the data it generates, and accumulates knowledge to improve services in the future.

[0365] The system of this invention primarily provides personalized travel plans, taking into account the user's preferences and emotional state. First, the user uses a device such as a smartphone or tablet to acquire image information of themselves using the camera. The acquired images are stored on the device and sent to the server through the application. The user's image information includes elements such as clothing and facial expressions.

[0366] The server uses image processing software to analyze the received image information and extract features such as clothing and facial expressions. This process also includes using generative AI models to estimate user preferences from the collected data.

[0367] Furthermore, the server incorporates an emotion engine, which has the ability to analyze the user's emotional state from audio data provided along with image information. This function allows it to capture emotions such as relaxation or excitement.

[0368] Based on aggregated preference and emotional information, the server generates a travel plan. In addition to general travel suggestions, it lists activities and places suited to the user's preferences, creating a more personalized plan. The generated travel plan is sent to the terminal and presented to the user.

[0369] Users can review the presented travel plan and adjust details within the application as needed. After completing the trip, users input feedback on the places visited and activities, including emotional responses, and send it from their device to the server. The server uses this feedback to learn and further improve the accuracy of future plans.

[0370] For example, if a user is dressed in "resort style," the AI model can infer that the user prefers beach resorts. An example of a prompt message would be, "Based on your current attire and voice input, it seems you want to relax and enjoy new experiences. Please suggest a travel plan that suits this situation." By giving instructions to the server in this format, the system can generate an appropriate travel plan.

[0371] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0372] Step 1:

[0373] The user uses the device's camera to take an image that includes their clothing and facial expression. This image data is saved on the device. The input is the image captured by the smartphone's camera, and the output is the saved image file. Specifically, the user follows the app's instructions to launch the camera and capture an image.

[0374] Step 2:

[0375] The device sends saved images to the server via the application. The input here is the image file stored on the device, and the output is the image data transferred to the server. Specifically, the application uploads the image data to the server using an internet connection.

[0376] Step 3:

[0377] The server acquires the received image data and uses image analysis software to extract features of clothing and facial expressions. The input here is the image data sent to the server, and the output is the extracted feature data. Specifically, the analysis program on the server executes image processing algorithms to identify colors, patterns, and facial expressions.

[0378] Step 4:

[0379] The server's estimation device uses a generative AI model based on extracted feature data to estimate user preferences. The input is extracted feature data, and the output is estimated data regarding user preferences. Specifically, the AI model utilizes machine learning algorithms to compare with past databases and identify the user's style and preferences.

[0380] Step 5:

[0381] The server's emotion engine analyzes image information and additional audio data to determine the user's emotional state. The input here is image and audio data, and the output is the analysis results regarding the emotional state. Specifically, the emotion analysis software quantifies voice tone and facial expressions, classifying them into states such as calmness or excitement.

[0382] Step 6:

[0383] The server integrates the results of preference and emotion analysis and generates personalized travel plans using a generation mechanism. The input is estimated preference and emotion data, and the output is the generated travel plan. Specifically, the server interacts with the AI using prompt messages to select activities and destinations suitable for each user.

[0384] Step 7:

[0385] The server sends the generated travel plan to the terminal. The terminal then displays the received plan to the user. The input is the travel plan data from the server, and the output is the travel plan displayed on the terminal. Specifically, the application receives the data and visualizes the plan on the user interface.

[0386] Step 8:

[0387] After their trip, the user enters feedback through the application, and the device sends this to the server. The input is the user's feedback data, and the output is the feedback transferred to the server. Specifically, the user enters their visited locations and impressions into an input form and presses the submit button.

[0388] Step 9:

[0389] The server analyzes the received feedback using a learning device and stores the learning results in a database to improve the accuracy of future travel plans. The input is the feedback data, and the output is the updated learning database. Specifically, a machine learning algorithm analyzes the feedback and integrates the most relevant data into the model.

[0390] (Application Example 2)

[0391] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0392] Providing personalized viewing content efficiently to diverse users is crucial for general content distribution services. However, conventional systems often rely solely on user preferences for recommendations, failing to consider the user's immediate emotional state. This can lead to decreased user satisfaction and increased churn rates. This invention aims to achieve more suitable content delivery by considering both user preferences and emotions in combination.

[0393] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0394] In this invention, the server includes an image acquisition means for acquiring image information of the user, an estimation means for estimating the user's preferences based on the acquired image information, and an emotion analysis means for analyzing the user's emotional information. This makes it possible to consider the user's preferences and emotions simultaneously and suggest the most suitable viewing content.

[0395] "Image acquisition means" refers to a mechanism for acquiring image information of the user, and involves acquiring the user's clothing and facial expressions using a camera or other photographic device.

[0396] An "estimation tool" is a device that analyzes acquired image information and uses that information to infer the user's preferences.

[0397] "Emotional analysis means" refers to a function for analyzing a user's emotional information, recognizing the user's emotions from image and audio information.

[0398] A "generation method" is a mechanism for suggesting viewing content based on estimated preferences and analyzed emotional information.

[0399] "Presentation means" refers to a function for presenting generated viewing content to the user.

[0400] A "learning tool" is a system that has an algorithm to improve the accuracy of its suggestions based on feedback information received from users.

[0401] A "generative AI model" is an algorithm that takes a prompt as input and generates the necessary information and content, and is a technology used to suggest recommended content.

[0402] A "prompt statement" is a sentence or command given to a generative AI model, used to specify the content to be generated and the direction of the response.

[0403] To implement this invention, a system is needed that collects the user's images and voice and, based on that, suggests the most suitable viewing content for the user. This system mainly consists of a server and terminals.

[0404] First, the device acquires images and audio from the user via its camera and microphone. Image acquisition uses image processing libraries such as OpenCV to extract features of clothing and facial expressions. Audio data is analyzed for emotional state using a deep learning-based emotion recognition model (such as TensorFlow or PyTorch).

[0405] The collected information is sent to a server. Based on this data, the server uses a generative AI model to suggest viewing content based on the user's preferences and immediate emotions. For example, the GPT model is used as the generative AI model, and a list of appropriate content is generated by inputting prompts.

[0406] The generated content is presented to the user on their device. The user reviews the presented content and sends their feedback back to the server via their device. The server analyzes this feedback using learning mechanisms to improve the accuracy of its suggestions.

[0407] As a concrete example, when a user is at home in relaxed clothing on the weekend, the system sends a prompt message to its AI model saying, "Please recommend some relaxing comedy movies." Based on this prompt, a list of comedy movies and dramas that suit the user's state is generated and presented on the device.

[0408] This makes it possible to deliver content that matches users' preferences and emotions, and to provide highly satisfying services to individual users.

[0409] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0410] Step 1:

[0411] The device acquires the user's image and voice. The device captures the user's voice and appearance through the camera and microphone. This input data includes image and audio files.

[0412] Step 2:

[0413] The device uses OpenCV to analyze the acquired images. The image acquisition method extracts features such as color, hue, and clothing style from the image and sends this data to the server. The output is data indicating the user's clothing and appearance.

[0414] Step 3:

[0415] The device passes the audio data to a deep learning model for analysis. The emotion analysis tool analyzes the tone and pitch of the voice to estimate the user's emotional state. The output of this step is an index representing the user's emotions.

[0416] Step 4:

[0417] The server uses the analyzed image data and sentiment data for estimation. This data is input into a generative AI model to generate prompt messages. These prompt messages request viewing content that matches the user's preferences and emotions.

[0418] Step 5:

[0419] The server uses the generated prompt text to run a generative AI model such as GPT. The model generates a list of appropriate viewing content based on the input prompt text. The output of this step is a list of viewing content.

[0420] Step 6:

[0421] The server sends the generated content to the device. The device then presents this list to the user, who selects the content they wish to view.

[0422] Step 7:

[0423] The user selects and watches content from a list of presented options. During this process, the device records the user's reactions as feedback and sends this data to the server.

[0424] Step 8:

[0425] The server uses learning tools to further analyze the received feedback information. This process improves the system's recommendation accuracy. The output of this step is the improved recommendation algorithm.

[0426] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0427] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0428] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0429] [Third Embodiment]

[0430] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0431] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0432] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0433] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0434] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0435] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0436] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0437] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0438] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0439] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0440] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0441] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0442] This invention is a system that analyzes a user's image information to estimate their preferences and create an appropriate travel plan. Specific embodiments of this system are described below.

[0443] First, the user uses their smartphone camera to take a picture of their everyday clothes. This image is sent to a server through the application on the device. At this point, the image acquisition device receives the image data.

[0444] The server applies image processing algorithms to analyze the acquired image and extract clothing features. The estimation tool uses this feature data to analyze the user's preferences using a generative AI, inferring the user's hobbies and interests. For example, bright colors and sporty clothing might suggest an interest in the outdoors or sports.

[0445] Next, the server generates a travel plan based on the above prediction results via a generation mechanism. This plan includes a starting point based on the user's location information, as well as suggestions for sightseeing destinations and activities that suit their preferences. Furthermore, the plan is optimized by taking into account the user's pre-registered individual constraints (such as allergies or places they dislike).

[0446] The travel plan generated from the server is sent to the device and presented to the user. The user can review this plan on their smartphone and make adjustments as needed.

[0447] After their trip, users input feedback about the tourist attractions and activities they experienced into the application. The device sends this feedback to the server, where its learning mechanisms analyze it. This allows the server to improve future travel plans, making them more accurate and appealing to the user.

[0448] As a concrete example, suppose a user uploads a photo of themselves wearing a casual jacket, jeans, and sneakers. The generating AI estimates from this outfit that the user prefers a relaxed atmosphere and casual outdoor activities. Based on this, the server suggests a plan that includes a picnic in a park or a nature walk, and the user accepts and carries it out.

[0449] This system allows users to efficiently obtain fresh travel experiences tailored to their individual needs.

[0450] The following describes the processing flow.

[0451] Step 1:

[0452] The user takes a picture of their clothing using their smartphone camera. This image data is saved within the application.

[0453] Step 2:

[0454] The device uploads the saved images to the server via the application. A secure protocol is used for image transmission to ensure safety.

[0455] Step 3:

[0456] The server analyzes the received image data. Using image processing algorithms, it extracts features such as clothing color, design, and accessories, and organizes them as digital data.

[0457] Step 4:

[0458] The server's estimation method uses generative AI based on feature data to infer the user's hobbies and preferences. This is a process of reading cultural background and lifestyle preferences from clothing style and color palettes.

[0459] Step 5:

[0460] The server generates a travel plan based on inferred preference information. In this process, it selects tourist destinations and activities according to location information and applies a route optimization algorithm to determine an efficient travel order.

[0461] Step 6:

[0462] The device receives the generated travel plan from the server and presents it to the user. The user can view the plan details within the application and, if necessary, modify or customize parts of the plan.

[0463] Step 7:

[0464] Users take a trip and then provide feedback via the application, including evaluations and impressions of the places they visited and the activities they participated in. This information is sent to the server.

[0465] Step 8:

[0466] The server analyzes the feedback it receives and uses a learning algorithm to update the database, which will then be used as a reference for future travel plans. This makes it possible to provide more personalized travel plans.

[0467] (Example 1)

[0468] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0469] When travelers plan trips that suit their preferences and interests, gathering a large amount of information and making appropriate choices based on that information is a challenging task. Furthermore, it can be difficult to incorporate past experiences and feedback into future plans. Therefore, there is a need to create effective and efficient travel plans tailored to individual preferences.

[0470] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0471] In this invention, the server includes an information acquisition means for acquiring the user's visual information, an analysis means for analyzing the acquired visual information and estimating the user's preferences, and a generation means for constructing and presenting a travel plan based on the analysis results. This makes it possible to generate a travel plan tailored to the individual preferences of the user and to reflect subsequent feedback in the next plan.

[0472] "Visual information" refers to image data related to the user's clothing and appearance, and is information acquired through the camera function.

[0473] "Information acquisition means" refers to the components of devices or software that have the function of collecting visual information.

[0474] "Analysis means" refers to technical means used to process acquired visual information and infer the user's preferences and interests.

[0475] "Generating means" refers to the function of a device or software that constructs a travel plan based on the analyzed results and presents it to the user.

[0476] "Learning methods" refer to technical systems that accumulate user feedback and use it to improve the accuracy of future travel plans.

[0477] "Travel planning" refers to suggestions for itineraries and destinations that are created based on the user's preferences and location information.

[0478] This invention relates to a system that automatically generates and presents travel plans tailored to the user's preferences. Specific embodiments of the invention are described below.

[0479] First, the user uses a device with a dedicated application installed to take pictures of their clothes and other items using the camera function. This image data is then sent to the server via the application on the device.

[0480] The server uses image processing libraries such as OpenCV to analyze the received image data. Through image analysis, it detects features such as clothing color, material, and style, and uses this information to estimate the user's preferences. This analysis result is then input as a prompt into the server's AI model, which then concretizes the user's preferences. A concrete example of a prompt might be, "Infer the user's preferences from their fashion style and create a travel plan based on that."

[0481] Next, the server automatically generates a travel plan based on the estimated user preferences. This plan includes the optimal route from the starting point to the destination, suggested sightseeing spots, and activities. The suggestions are optimized based on the user's history, including their registered location and rating information.

[0482] Finally, the generated travel plan is sent to the device and presented to the user. After the trip, the user provides feedback to the application with evaluation information about the places visited and activities participated in. The evaluation information sent from the device to the server is analyzed by the server's learning mechanism and reflected in the next travel plan.

[0483] This invention enables users to efficiently obtain personalized travel plans tailored to their individual preferences, thereby enriching their travel experience.

[0484] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0485] Step 1:

[0486] The user takes a photo of their clothing using the device's camera. This photo is saved on the device as a JPEG or PNG file and sent to the server via a dedicated application. The transmitted image data becomes the server's input.

[0487] Step 2:

[0488] The server analyzes the received image data using an image processing library such as OpenCV. This image analysis process detects and extracts features such as clothing color, pattern, and style, obtaining this feature data as output. Specific operations here include data processing such as edge detection and color analysis.

[0489] Step 3:

[0490] The server inputs the extracted feature data as prompts into the generating AI model. Based on these prompts, the AI model makes estimations about the user's preferences. In this estimation process, the AI uses a pre-trained model to output the user's preferences (for example, styles such as sporty or casual).

[0491] Step 4:

[0492] The server generates a travel plan based on estimated preference information. This process takes into account the user's location and selects relevant tourist spots and activities. The server processes this information and outputs a travel plan that includes suggestions for the optimal route and activities for the user.

[0493] Step 5:

[0494] The terminal receives the travel plan sent from the server and presents it to the user. The user can then review the travel details based on this plan and make manual adjustments as needed. The adjusted plan is then output to the terminal as the final plan.

[0495] Step 6:

[0496] After the trip ends, the user enters their evaluation information about the trip on their device and sends feedback from the device to the server. This feedback becomes input to the server.

[0497] Step 7:

[0498] The server analyzes the received feedback and implements a learning process for future travel suggestions. Based on this data, the server updates its model to improve the accuracy of future plan generation. The analysis results are then output to the server's database.

[0499] (Application Example 1)

[0500] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0501] In today's urban environment, it is difficult for tourists and residents alike to effectively discover and experience tourist destinations and activities that suit their individual preferences. Therefore, there is a need for a system that allows users to easily and quickly obtain travel plans that are best suited to their interests and circumstances.

[0502] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0503] In this invention, the server includes means for acquiring video information of the user, means for estimating the user's preferences based on the acquired video information, and means for generating a travel plan based on the estimated preferences. This makes it possible to efficiently propose the optimal sightseeing route tailored to the individual preferences of the user.

[0504] "Users" refer to individuals who use the system to receive personalized travel plans based on their individual preferences.

[0505] "Visual information" refers to images and visual data acquired by the system, such as the user's clothing and location information.

[0506] "Preferences" refer to characteristics related to the user's interests and preferences, and are elements that the system considers when constructing a travel plan.

[0507] A "travel plan" refers to a plan that includes suggestions for tourist destinations and activities generated based on the user's preferences and location information.

[0508] "Means" refers to the components or methods used to achieve a specific function or process within a system.

[0509] An "optimal sightseeing route" refers to a path that allows users to visit tourist destinations in an efficient and engaging way, based on their preferences and current location.

[0510] This invention can be implemented by a user using a dedicated application installed on a device such as a smartphone. First, the user uses the device's camera to photograph their everyday clothing and surrounding environment and sends the video information to a server. The server receives this video information and uses image analysis software to extract features of the clothing and background. Specifically, it uses image processing technologies such as the Azure Computer Vision API.

[0511] The extracted feature information is used as base data to estimate user preferences using a generative AI model. For example, if a user is wearing a casual jacket, the system might infer that they prefer relaxed outdoor activities. Based on this preference data, the server generates an efficient and interesting travel plan that reflects the user's location. In generating this plan, for example, the Google Maps API can be used to suggest the optimal sightseeing route.

[0512] The generated travel plan is presented to the user via their device, allowing them to plan their trip based on the presented plan. Furthermore, feedback information entered by the user after the trip is sent to the server, analyzed again by AI as unclassified data, and used as learning data to improve the accuracy of future planning. This enables users to obtain more personalized and appealing travel experiences.

[0513] As a concrete example, when a user uploads an image of sneakers and jeans, the server might infer that the user prefers casual and active activities and suggest plans for a picnic in a park or a nature walk.

[0514] An example of a prompt message might be, "This user is wearing a casual jacket. This attire conveys a relaxed atmosphere. What travel activity would be most suitable?" This is the kind of message that would be input to the generative AI model.

[0515] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0516] Step 1:

[0517] The user uses their smartphone to photograph their clothing and surrounding environment. The input is video information captured by the smartphone's camera. This video information is temporarily stored on the device and later sent to the server.

[0518] Step 2:

[0519] The terminal transmits the acquired video information to the server. The server receives this video information and prepares it for analysis. The input is the image data transmitted from the terminal, and the output is the data ready for analysis.

[0520] Step 3:

[0521] The server begins image analysis using the received image data. Specifically, it uses the Azure Computer Vision API to extract features of the user's clothing and background. This analysis process analyzes the color, shape, and decorations of the clothing in the image and outputs them as feature information.

[0522] Step 4:

[0523] The server uses the extracted feature information to estimate user preferences using a generative AI model. The input is the feature information obtained in the previous step, and the output is the estimated preference data. Based on the estimation results, data about activities that the user is likely to be interested in is generated.

[0524] Step 5:

[0525] The server combines estimated preferences with the user's location information and uses the Google Maps API to generate an optimal travel plan. The input is estimated preference data and location information, and the output is optimal sightseeing route information as a travel plan.

[0526] Step 6:

[0527] The server sends the generated travel plan to the terminal, which then presents it to the user. The input is the generated travel plan data, and the output is the sightseeing route information displayed on the user's screen. The user can then plan their trip based on this information.

[0528] Step 7:

[0529] After completing their trip, users input feedback using the application. The device aggregates this feedback data and sends it to the server. The input is user-provided feedback data, and the output is data prepared for learning purposes.

[0530] Step 8:

[0531] The server analyzes the received feedback data and performs machine learning to improve the accuracy of the next travel plan. The input is the feedback data, and the output is an improved preference estimation model. By repeatedly training the generative AI model in this process, the accuracy of the suggestions improves.

[0532] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0533] This invention provides a system that recognizes the user's emotional state and offers a more accurate, personalized travel plan by combining a conventional function of estimating preferences using the user's image information and creating a travel plan with an emotion engine. A specific embodiment of this system is described below.

[0534] First, the user uses their smartphone camera to take a picture of their clothing and saves the image to their device. This image is then sent to the server through the application on the device.

[0535] The server analyzes images, with the image acquisition mechanism working to extract clothing features and providing this data to the estimation mechanism. The estimation mechanism uses generative AI to infer the user's preferences from the clothing. Simultaneously, the emotion engine analyzes the emotional state from the images and additional voice input to determine the user's emotions. This information integrates the user's immediate emotions and long-term preferences and is used to design travel plans.

[0536] For example, if the user is relaxed yet curious, the emotion engine can suggest a travel plan that includes a calming natural environment while also offering new experiences. Based on this information, the server creates a travel plan via a generation mechanism and sends it to the device.

[0537] Users can view the travel plan presented on their device and check the details within the application. The user's reaction to the plan is again monitored by the sentiment engine, and the plan can be adjusted in real time.

[0538] After a trip, users provide feedback on their destinations and activities. This feedback is sent from the device to a server and analyzed by learning tools. The feedback includes the user's emotional responses, which are used to further improve the accuracy of future plans.

[0539] For example, if the emotion engine determines that the user is excited when a trip is presented, it may indicate that the proposed plan includes high-energy activities or events. In this way, the system of the present invention can dynamically adapt and provide a travel plan based on the user's current emotional state and individual preferences.

[0540] The following describes the processing flow.

[0541] Step 1:

[0542] The user takes a picture of their clothing using their smartphone camera and saves the image data to their device. This image reflects the user's usual style.

[0543] Step 2:

[0544] The device uploads images taken via the application to the server. In addition, if the user wishes, emotional data can also be sent via voice input.

[0545] Step 3:

[0546] The server analyzes the image data received using the image acquisition method to extract the characteristics of the clothing's color, pattern, and style. This allows for an initial assessment of the user's preferences.

[0547] Step 4:

[0548] The server's estimation method uses a generative AI to predict user preferences based on feature data extracted from images. During this process, past data is also referenced to check for consistency in preferences.

[0549] Step 5:

[0550] The server's emotion engine analyzes the received additional emotion information (image and audio facial expression analysis) to understand the user's emotional state. This information is then integrated with the preference estimation results.

[0551] Step 6:

[0552] The server generates a travel plan tailored to the user based on predictions and emotional states. The generated plan includes suggested destinations, routes, and recommended activities.

[0553] Step 7:

[0554] The device receives the generated travel plan from the server and displays it to the user. The user can review the plan on the app and provide feedback on changes or suggestions as needed.

[0555] Step 8:

[0556] During the trip, the user's device continuously monitors their emotional state and adjusts the travel plan in real time as needed.

[0557] Step 9:

[0558] After their trip, users provide feedback on tourist destinations and activities via their devices and send it to the server. This feedback includes emotional responses, which are then used by the server's learning mechanisms to improve future travel plan suggestions.

[0559] (Example 2)

[0560] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0561] Traditional travel planning systems lacked the functionality to provide personalized plans that took into account the individual preferences and emotional states of users. This made it difficult to deliver a satisfying travel experience. Furthermore, there was a need for a system that could analyze users' emotional states in real time and flexibly adjust the plan accordingly.

[0562] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0563] In this invention, the server includes a device for acquiring image information of the user, a device for estimating the user's preferences based on the acquired image information, and a device for analyzing the user's emotional state from information related to the user. This makes it possible to generate and provide travel plans tailored to the individual preferences and emotional state of the user.

[0564] A "device for acquiring user image information" is a device that uses the device's camera or sensors to collect image data, including the user's posture and clothing.

[0565] A "preference estimation device" is a device equipped with an algorithm that calculates the user's preferred style and activity tendencies based on collected data.

[0566] A "device for analyzing emotional states" is a device equipped with the function of analyzing the user's visual and auditory information to determine their mental state and mood.

[0567] A "travel plan generation device" is a device that designs and proposes the optimal travel itinerary and activities based on estimated preferences and analyzed emotional states.

[0568] A "presenting device" is a device that has an interface for conveying generated plans or information to the user visually or audibly.

[0569] A "device that learns information" is a device that analyzes feedback received from users and the data it generates, and accumulates knowledge to improve services in the future.

[0570] The system of this invention primarily provides personalized travel plans, taking into account the user's preferences and emotional state. First, the user uses a device such as a smartphone or tablet to acquire image information of themselves using the camera. The acquired images are stored on the device and sent to the server through the application. The user's image information includes elements such as clothing and facial expressions.

[0571] The server uses image processing software to analyze the received image information and extract features such as clothing and facial expressions. This process also includes using generative AI models to estimate user preferences from the collected data.

[0572] Furthermore, the server incorporates an emotion engine, which has the ability to analyze the user's emotional state from audio data provided along with image information. This function allows it to capture emotions such as relaxation or excitement.

[0573] Based on aggregated preference and emotional information, the server generates a travel plan. In addition to general travel suggestions, it lists activities and places suited to the user's preferences, creating a more personalized plan. The generated travel plan is sent to the terminal and presented to the user.

[0574] Users can review the presented travel plan and adjust details within the application as needed. After completing the trip, users input feedback on the places visited and activities, including emotional responses, and send it from their device to the server. The server uses this feedback to learn and further improve the accuracy of future plans.

[0575] For example, if a user is dressed in "resort style," the AI model can infer that the user prefers beach resorts. An example of a prompt message would be, "Based on your current attire and voice input, it seems you want to relax and enjoy new experiences. Please suggest a travel plan that suits this situation." By giving instructions to the server in this format, the system can generate an appropriate travel plan.

[0576] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0577] Step 1:

[0578] The user uses the device's camera to take an image that includes their clothing and facial expression. This image data is saved on the device. The input is the image captured by the smartphone's camera, and the output is the saved image file. Specifically, the user follows the app's instructions to launch the camera and capture an image.

[0579] Step 2:

[0580] The device sends saved images to the server via the application. The input here is the image file stored on the device, and the output is the image data transferred to the server. Specifically, the application uploads the image data to the server using an internet connection.

[0581] Step 3:

[0582] The server acquires the received image data and uses image analysis software to extract features of clothing and facial expressions. The input here is the image data sent to the server, and the output is the extracted feature data. Specifically, the analysis program on the server executes image processing algorithms to identify colors, patterns, and facial expressions.

[0583] Step 4:

[0584] The server's estimation device uses a generative AI model based on extracted feature data to estimate user preferences. The input is extracted feature data, and the output is estimated data regarding user preferences. Specifically, the AI model utilizes machine learning algorithms to compare with past databases and identify the user's style and preferences.

[0585] Step 5:

[0586] The server's emotion engine analyzes image information and additional audio data to determine the user's emotional state. The input here is image and audio data, and the output is the analysis results regarding the emotional state. Specifically, the emotion analysis software quantifies voice tone and facial expressions, classifying them into states such as calmness or excitement.

[0587] Step 6:

[0588] The server integrates the results of preference and emotion analysis and generates personalized travel plans using a generation mechanism. The input is estimated preference and emotion data, and the output is the generated travel plan. Specifically, the server interacts with the AI using prompt messages to select activities and destinations suitable for each user.

[0589] Step 7:

[0590] The server sends the generated travel plan to the terminal. The terminal then displays the received plan to the user. The input is the travel plan data from the server, and the output is the travel plan displayed on the terminal. Specifically, the application receives the data and visualizes the plan on the user interface.

[0591] Step 8:

[0592] After their trip, the user enters feedback through the application, and the device sends this to the server. The input is the user's feedback data, and the output is the feedback transferred to the server. Specifically, the user enters their visited locations and impressions into an input form and presses the submit button.

[0593] Step 9:

[0594] The server analyzes the received feedback using a learning device and stores the learning results in a database to improve the accuracy of future travel plans. The input is the feedback data, and the output is the updated learning database. Specifically, a machine learning algorithm analyzes the feedback and integrates the most relevant data into the model.

[0595] (Application Example 2)

[0596] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0597] Providing personalized viewing content efficiently to diverse users is crucial for general content distribution services. However, conventional systems often rely solely on user preferences for recommendations, failing to consider the user's immediate emotional state. This can lead to decreased user satisfaction and increased churn rates. This invention aims to achieve more suitable content delivery by considering both user preferences and emotions in combination.

[0598] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0599] In this invention, the server includes an image acquisition means for acquiring image information of the user, an estimation means for estimating the user's preferences based on the acquired image information, and an emotion analysis means for analyzing the user's emotional information. This makes it possible to consider the user's preferences and emotions simultaneously and suggest the most suitable viewing content.

[0600] "Image acquisition means" refers to a mechanism for acquiring image information of the user, and involves acquiring the user's clothing and facial expressions using a camera or other photographic device.

[0601] An "estimation tool" is a device that analyzes acquired image information and uses that information to infer the user's preferences.

[0602] "Emotional analysis means" refers to a function for analyzing a user's emotional information, recognizing the user's emotions from image and audio information.

[0603] A "generation method" is a mechanism for suggesting viewing content based on estimated preferences and analyzed emotional information.

[0604] "Presentation means" refers to a function for presenting generated viewing content to the user.

[0605] A "learning tool" is a system that has an algorithm to improve the accuracy of its suggestions based on feedback information received from users.

[0606] A "generative AI model" is an algorithm that takes a prompt as input and generates the necessary information and content, and is a technology used to suggest recommended content.

[0607] A "prompt statement" is a sentence or command given to a generative AI model, used to specify the content to be generated and the direction of the response.

[0608] To implement this invention, a system is needed that collects the user's images and voice and, based on that, suggests the most suitable viewing content for the user. This system mainly consists of a server and terminals.

[0609] First, the device acquires images and audio from the user via its camera and microphone. Image acquisition uses image processing libraries such as OpenCV to extract features of clothing and facial expressions. Audio data is analyzed for emotional state using a deep learning-based emotion recognition model (such as TensorFlow or PyTorch).

[0610] The collected information is sent to a server. Based on this data, the server uses a generative AI model to suggest viewing content based on the user's preferences and immediate emotions. For example, the GPT model is used as the generative AI model, and a list of appropriate content is generated by inputting prompts.

[0611] The generated content is presented to the user on their device. The user reviews the presented content and sends their feedback back to the server via their device. The server analyzes this feedback using learning mechanisms to improve the accuracy of its suggestions.

[0612] As a concrete example, when a user is at home in relaxed clothing on the weekend, the system sends a prompt message to its AI model saying, "Please recommend some relaxing comedy movies." Based on this prompt, a list of comedy movies and dramas that suit the user's state is generated and presented on the device.

[0613] This makes it possible to deliver content that matches users' preferences and emotions, and to provide highly satisfying services to individual users.

[0614] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0615] Step 1:

[0616] The device acquires the user's image and voice. The device captures the user's voice and appearance through the camera and microphone. This input data includes image and audio files.

[0617] Step 2:

[0618] The device uses OpenCV to analyze the acquired images. The image acquisition method extracts features such as color, hue, and clothing style from the image and sends this data to the server. The output is data indicating the user's clothing and appearance.

[0619] Step 3:

[0620] The device passes the audio data to a deep learning model for analysis. The emotion analysis tool analyzes the tone and pitch of the voice to estimate the user's emotional state. The output of this step is an index representing the user's emotions.

[0621] Step 4:

[0622] The server uses the analyzed image data and sentiment data for estimation. This data is input into a generative AI model to generate prompt messages. These prompt messages request viewing content that matches the user's preferences and emotions.

[0623] Step 5:

[0624] The server uses the generated prompt text to run a generative AI model such as GPT. The model generates a list of appropriate viewing content based on the input prompt text. The output of this step is a list of viewing content.

[0625] Step 6:

[0626] The server sends the generated content to the device. The device then presents this list to the user, who selects the content they wish to view.

[0627] Step 7:

[0628] The user selects and watches content from a list of presented options. During this process, the device records the user's reactions as feedback and sends this data to the server.

[0629] Step 8:

[0630] The server uses learning tools to further analyze the received feedback information. This process improves the system's recommendation accuracy. The output of this step is the improved recommendation algorithm.

[0631] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0632] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0633] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0634] [Fourth Embodiment]

[0635] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0636] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0637] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0638] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0639] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0640] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0641] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0642] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0643] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0644] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0645] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0646] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0647] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0648] This invention is a system that analyzes a user's image information to estimate their preferences and create an appropriate travel plan. Specific embodiments of this system are described below.

[0649] First, the user uses their smartphone camera to take a picture of their everyday clothes. This image is sent to a server through the application on the device. At this point, the image acquisition device receives the image data.

[0650] The server applies image processing algorithms to analyze the acquired image and extract clothing features. The estimation tool uses this feature data to analyze the user's preferences using a generative AI, inferring the user's hobbies and interests. For example, bright colors and sporty clothing might suggest an interest in the outdoors or sports.

[0651] Next, the server generates a travel plan based on the above prediction results via a generation mechanism. This plan includes a starting point based on the user's location information, as well as suggestions for sightseeing destinations and activities that suit their preferences. Furthermore, the plan is optimized by taking into account the user's pre-registered individual constraints (such as allergies or places they dislike).

[0652] The travel plan generated from the server is sent to the device and presented to the user. The user can review this plan on their smartphone and make adjustments as needed.

[0653] After their trip, users input feedback about the tourist attractions and activities they experienced into the application. The device sends this feedback to the server, where its learning mechanisms analyze it. This allows the server to improve future travel plans, making them more accurate and appealing to the user.

[0654] As a concrete example, suppose a user uploads a photo of themselves wearing a casual jacket, jeans, and sneakers. The generating AI estimates from this outfit that the user prefers a relaxed atmosphere and casual outdoor activities. Based on this, the server suggests a plan that includes a picnic in a park or a nature walk, and the user accepts and carries it out.

[0655] This system allows users to efficiently obtain fresh travel experiences tailored to their individual needs.

[0656] The following describes the processing flow.

[0657] Step 1:

[0658] The user takes a picture of their clothing using their smartphone camera. This image data is saved within the application.

[0659] Step 2:

[0660] The device uploads the saved images to the server via the application. A secure protocol is used for image transmission to ensure safety.

[0661] Step 3:

[0662] The server analyzes the received image data. Using image processing algorithms, it extracts features such as clothing color, design, and accessories, and organizes them as digital data.

[0663] Step 4:

[0664] The server's estimation method uses generative AI based on feature data to infer the user's hobbies and preferences. This is a process of reading cultural background and lifestyle preferences from clothing style and color palettes.

[0665] Step 5:

[0666] The server generates a travel plan based on inferred preference information. In this process, it selects tourist destinations and activities according to location information and applies a route optimization algorithm to determine an efficient travel order.

[0667] Step 6:

[0668] The device receives the generated travel plan from the server and presents it to the user. The user can view the plan details within the application and, if necessary, modify or customize parts of the plan.

[0669] Step 7:

[0670] Users take a trip and then provide feedback via the application, including evaluations and impressions of the places they visited and the activities they participated in. This information is sent to the server.

[0671] Step 8:

[0672] The server analyzes the feedback it receives and uses a learning algorithm to update the database, which will then be used as a reference for future travel plans. This makes it possible to provide more personalized travel plans.

[0673] (Example 1)

[0674] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0675] When travelers plan trips that suit their preferences and interests, gathering a large amount of information and making appropriate choices based on that information is a challenging task. Furthermore, it can be difficult to incorporate past experiences and feedback into future plans. Therefore, there is a need to create effective and efficient travel plans tailored to individual preferences.

[0676] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0677] In this invention, the server includes an information acquisition means for acquiring the user's visual information, an analysis means for analyzing the acquired visual information and estimating the user's preferences, and a generation means for constructing and presenting a travel plan based on the analysis results. This makes it possible to generate a travel plan tailored to the individual preferences of the user and to reflect subsequent feedback in the next plan.

[0678] "Visual information" refers to image data related to the user's clothing and appearance, and is information acquired through the camera function.

[0679] "Information acquisition means" refers to the components of devices or software that have the function of collecting visual information.

[0680] "Analysis means" refers to technical means used to process acquired visual information and infer the user's preferences and interests.

[0681] "Generating means" refers to the function of a device or software that constructs a travel plan based on the analyzed results and presents it to the user.

[0682] "Learning methods" refer to technical systems that accumulate user feedback and use it to improve the accuracy of future travel plans.

[0683] "Travel planning" refers to suggestions for itineraries and destinations that are created based on the user's preferences and location information.

[0684] This invention relates to a system that automatically generates and presents travel plans tailored to the user's preferences. Specific embodiments of the invention are described below.

[0685] First, the user uses a device with a dedicated application installed to take pictures of their clothes and other items using the camera function. This image data is then sent to the server via the application on the device.

[0686] The server uses image processing libraries such as OpenCV to analyze the received image data. Through image analysis, it detects features such as clothing color, material, and style, and uses this information to estimate the user's preferences. This analysis result is then input as a prompt into the server's AI model, which then concretizes the user's preferences. A concrete example of a prompt might be, "Infer the user's preferences from their fashion style and create a travel plan based on that."

[0687] Next, the server automatically generates a travel plan based on the estimated user preferences. This plan includes the optimal route from the starting point to the destination, suggested sightseeing spots, and activities. The suggestions are optimized based on the user's history, including their registered location and rating information.

[0688] Finally, the generated travel plan is sent to the device and presented to the user. After the trip, the user provides feedback to the application with evaluation information about the places visited and activities participated in. The evaluation information sent from the device to the server is analyzed by the server's learning mechanism and reflected in the next travel plan.

[0689] This invention enables users to efficiently obtain personalized travel plans tailored to their individual preferences, thereby enriching their travel experience.

[0690] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0691] Step 1:

[0692] The user takes a photo of their clothing using the device's camera. This photo is saved on the device as a JPEG or PNG file and sent to the server via a dedicated application. The transmitted image data becomes the server's input.

[0693] Step 2:

[0694] The server analyzes the received image data using an image processing library such as OpenCV. This image analysis process detects and extracts features such as clothing color, pattern, and style, obtaining this feature data as output. Specific operations here include data processing such as edge detection and color analysis.

[0695] Step 3:

[0696] The server inputs the extracted feature data as prompts into the generating AI model. Based on these prompts, the AI model makes estimations about the user's preferences. In this estimation process, the AI uses a pre-trained model to output the user's preferences (for example, styles such as sporty or casual).

[0697] Step 4:

[0698] The server generates a travel plan based on estimated preference information. This process takes into account the user's location and selects relevant tourist spots and activities. The server processes this information and outputs a travel plan that includes suggestions for the optimal route and activities for the user.

[0699] Step 5:

[0700] The terminal receives the travel plan sent from the server and presents it to the user. The user can then review the travel details based on this plan and make manual adjustments as needed. The adjusted plan is then output to the terminal as the final plan.

[0701] Step 6:

[0702] After the trip ends, the user enters their evaluation information about the trip on their device and sends feedback from the device to the server. This feedback becomes input to the server.

[0703] Step 7:

[0704] The server analyzes the received feedback and implements a learning process for future travel suggestions. Based on this data, the server updates its model to improve the accuracy of future plan generation. The analysis results are then output to the server's database.

[0705] (Application Example 1)

[0706] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0707] In today's urban environment, it is difficult for tourists and residents alike to effectively discover and experience tourist destinations and activities that suit their individual preferences. Therefore, there is a need for a system that allows users to easily and quickly obtain travel plans that are best suited to their interests and circumstances.

[0708] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0709] In this invention, the server includes means for acquiring video information of the user, means for estimating the user's preferences based on the acquired video information, and means for generating a travel plan based on the estimated preferences. This makes it possible to efficiently propose the optimal sightseeing route tailored to the individual preferences of the user.

[0710] "Users" refer to individuals who use the system to receive personalized travel plans based on their individual preferences.

[0711] "Visual information" refers to images and visual data acquired by the system, such as the user's clothing and location information.

[0712] "Preferences" refer to characteristics related to the user's interests and preferences, and are elements that the system considers when constructing a travel plan.

[0713] A "travel plan" refers to a plan that includes suggestions for tourist destinations and activities generated based on the user's preferences and location information.

[0714] "Means" refers to the components or methods used to achieve a specific function or process within a system.

[0715] An "optimal sightseeing route" refers to a path that allows users to visit tourist destinations in an efficient and engaging way, based on their preferences and current location.

[0716] This invention can be implemented by a user using a dedicated application installed on a device such as a smartphone. First, the user uses the device's camera to photograph their everyday clothing and surrounding environment and sends the video information to a server. The server receives this video information and uses image analysis software to extract features of the clothing and background. Specifically, it uses image processing technologies such as the Azure Computer Vision API.

[0717] The extracted feature information is used as base data to estimate user preferences using a generative AI model. For example, if a user is wearing a casual jacket, the system might infer that they prefer relaxed outdoor activities. Based on this preference data, the server generates an efficient and interesting travel plan that reflects the user's location. In generating this plan, for example, the Google Maps API can be used to suggest the optimal sightseeing route.

[0718] The generated travel plan is presented to the user via their device, allowing them to plan their trip based on the presented plan. Furthermore, feedback information entered by the user after the trip is sent to the server, analyzed again by AI as unclassified data, and used as learning data to improve the accuracy of future planning. This enables users to obtain more personalized and appealing travel experiences.

[0719] As a concrete example, when a user uploads an image of sneakers and jeans, the server might infer that the user prefers casual and active activities and suggest plans for a picnic in a park or a nature walk.

[0720] An example of a prompt message might be, "This user is wearing a casual jacket. This attire conveys a relaxed atmosphere. What travel activity would be most suitable?" This is the kind of message that would be input to the generative AI model.

[0721] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0722] Step 1:

[0723] The user uses their smartphone to photograph their clothing and surrounding environment. The input is video information captured by the smartphone's camera. This video information is temporarily stored on the device and later sent to the server.

[0724] Step 2:

[0725] The terminal transmits the acquired video information to the server. The server receives this video information and prepares it for analysis. The input is the image data transmitted from the terminal, and the output is the data ready for analysis.

[0726] Step 3:

[0727] The server begins image analysis using the received image data. Specifically, it uses the Azure Computer Vision API to extract features of the user's clothing and background. This analysis process analyzes the color, shape, and decorations of the clothing in the image and outputs them as feature information.

[0728] Step 4:

[0729] The server uses the extracted feature information to estimate user preferences using a generative AI model. The input is the feature information obtained in the previous step, and the output is the estimated preference data. Based on the estimation results, data about activities that the user is likely to be interested in is generated.

[0730] Step 5:

[0731] The server combines estimated preferences with the user's location information and uses the Google Maps API to generate an optimal travel plan. The input is estimated preference data and location information, and the output is optimal sightseeing route information as a travel plan.

[0732] Step 6:

[0733] The server sends the generated travel plan to the terminal, which then presents it to the user. The input is the generated travel plan data, and the output is the sightseeing route information displayed on the user's screen. The user can then plan their trip based on this information.

[0734] Step 7:

[0735] After completing their trip, users input feedback using the application. The device aggregates this feedback data and sends it to the server. The input is user-provided feedback data, and the output is data prepared for learning purposes.

[0736] Step 8:

[0737] The server analyzes the received feedback data and performs machine learning to improve the accuracy of the next travel plan. The input is the feedback data, and the output is an improved preference estimation model. By repeatedly training the generative AI model in this process, the accuracy of the suggestions improves.

[0738] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0739] This invention provides a system that recognizes the user's emotional state and offers a more accurate, personalized travel plan by combining a conventional function of estimating preferences using the user's image information and creating a travel plan with an emotion engine. A specific embodiment of this system is described below.

[0740] First, the user uses their smartphone camera to take a picture of their clothing and saves the image to their device. This image is then sent to the server through the application on the device.

[0741] The server analyzes images, with the image acquisition mechanism working to extract clothing features and providing this data to the estimation mechanism. The estimation mechanism uses generative AI to infer the user's preferences from the clothing. Simultaneously, the emotion engine analyzes the emotional state from the images and additional voice input to determine the user's emotions. This information integrates the user's immediate emotions and long-term preferences and is used to design travel plans.

[0742] For example, if the user is relaxed yet curious, the emotion engine can suggest a travel plan that includes a calming natural environment while also offering new experiences. Based on this information, the server creates a travel plan via a generation mechanism and sends it to the device.

[0743] Users can view the travel plan presented on their device and check the details within the application. The user's reaction to the plan is again monitored by the sentiment engine, and the plan can be adjusted in real time.

[0744] After a trip, users provide feedback on their destinations and activities. This feedback is sent from the device to a server and analyzed by learning tools. The feedback includes the user's emotional responses, which are used to further improve the accuracy of future plans.

[0745] For example, if the emotion engine determines that the user is excited when a trip is presented, it may indicate that the proposed plan includes high-energy activities or events. In this way, the system of the present invention can dynamically adapt and provide a travel plan based on the user's current emotional state and individual preferences.

[0746] The following describes the processing flow.

[0747] Step 1:

[0748] The user takes a picture of their clothing using their smartphone camera and saves the image data to their device. This image reflects the user's usual style.

[0749] Step 2:

[0750] The device uploads images taken via the application to the server. In addition, if the user wishes, emotional data can also be sent via voice input.

[0751] Step 3:

[0752] The server analyzes the image data received using the image acquisition method to extract the characteristics of the clothing's color, pattern, and style. This allows for an initial assessment of the user's preferences.

[0753] Step 4:

[0754] The server's estimation method uses a generative AI to predict user preferences based on feature data extracted from images. During this process, past data is also referenced to check for consistency in preferences.

[0755] Step 5:

[0756] The server's emotion engine analyzes the received additional emotion information (image and audio facial expression analysis) to understand the user's emotional state. This information is then integrated with the preference estimation results.

[0757] Step 6:

[0758] The server generates a travel plan tailored to the user based on predictions and emotional states. The generated plan includes suggested destinations, routes, and recommended activities.

[0759] Step 7:

[0760] The device receives the generated travel plan from the server and displays it to the user. The user can review the plan on the app and provide feedback on changes or suggestions as needed.

[0761] Step 8:

[0762] During the trip, the user's device continuously monitors their emotional state and adjusts the travel plan in real time as needed.

[0763] Step 9:

[0764] After their trip, users provide feedback on tourist destinations and activities via their devices and send it to the server. This feedback includes emotional responses, which are then used by the server's learning mechanisms to improve future travel plan suggestions.

[0765] (Example 2)

[0766] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0767] Traditional travel planning systems lacked the functionality to provide personalized plans that took into account the individual preferences and emotional states of users. This made it difficult to deliver a satisfying travel experience. Furthermore, there was a need for a system that could analyze users' emotional states in real time and flexibly adjust the plan accordingly.

[0768] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0769] In this invention, the server includes a device for acquiring image information of the user, a device for estimating the user's preferences based on the acquired image information, and a device for analyzing the user's emotional state from information related to the user. This makes it possible to generate and provide travel plans tailored to the individual preferences and emotional state of the user.

[0770] A "device for acquiring user image information" is a device that uses the device's camera or sensors to collect image data, including the user's posture and clothing.

[0771] A "preference estimation device" is a device equipped with an algorithm that calculates the user's preferred style and activity tendencies based on collected data.

[0772] A "device for analyzing emotional states" is a device equipped with the function of analyzing the user's visual and auditory information to determine their mental state and mood.

[0773] A "travel plan generation device" is a device that designs and proposes the optimal travel itinerary and activities based on estimated preferences and analyzed emotional states.

[0774] A "presenting device" is a device that has an interface for conveying generated plans or information to the user visually or audibly.

[0775] A "device that learns information" is a device that analyzes feedback received from users and the data it generates, and accumulates knowledge to improve services in the future.

[0776] The system of this invention primarily provides personalized travel plans, taking into account the user's preferences and emotional state. First, the user uses a device such as a smartphone or tablet to acquire image information of themselves using the camera. The acquired images are stored on the device and sent to the server through the application. The user's image information includes elements such as clothing and facial expressions.

[0777] The server uses image processing software to analyze the received image information and extract features such as clothing and facial expressions. This process also includes using generative AI models to estimate user preferences from the collected data.

[0778] Furthermore, the server incorporates an emotion engine, which has the ability to analyze the user's emotional state from audio data provided along with image information. This function allows it to capture emotions such as relaxation or excitement.

[0779] Based on aggregated preference and emotional information, the server generates a travel plan. In addition to general travel suggestions, it lists activities and places suited to the user's preferences, creating a more personalized plan. The generated travel plan is sent to the terminal and presented to the user.

[0780] Users can review the presented travel plan and adjust details within the application as needed. After completing the trip, users input feedback on the places visited and activities, including emotional responses, and send it from their device to the server. The server uses this feedback to learn and further improve the accuracy of future plans.

[0781] For example, if a user is dressed in "resort style," the AI model can infer that the user prefers beach resorts. An example of a prompt message would be, "Based on your current attire and voice input, it seems you want to relax and enjoy new experiences. Please suggest a travel plan that suits this situation." By giving instructions to the server in this format, the system can generate an appropriate travel plan.

[0782] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0783] Step 1:

[0784] The user uses the device's camera to take an image that includes their clothing and facial expression. This image data is saved on the device. The input is the image captured by the smartphone's camera, and the output is the saved image file. Specifically, the user follows the app's instructions to launch the camera and capture an image.

[0785] Step 2:

[0786] The device sends saved images to the server via the application. The input here is the image file stored on the device, and the output is the image data transferred to the server. Specifically, the application uploads the image data to the server using an internet connection.

[0787] Step 3:

[0788] The server acquires the received image data and uses image analysis software to extract features of clothing and facial expressions. The input here is the image data sent to the server, and the output is the extracted feature data. Specifically, the analysis program on the server executes image processing algorithms to identify colors, patterns, and facial expressions.

[0789] Step 4:

[0790] The server's estimation device uses a generative AI model based on extracted feature data to estimate user preferences. The input is extracted feature data, and the output is estimated data regarding user preferences. Specifically, the AI model utilizes machine learning algorithms to compare with past databases and identify the user's style and preferences.

[0791] Step 5:

[0792] The server's emotion engine analyzes image information and additional audio data to determine the user's emotional state. The input here is image and audio data, and the output is the analysis results regarding the emotional state. Specifically, the emotion analysis software quantifies voice tone and facial expressions, classifying them into states such as calmness or excitement.

[0793] Step 6:

[0794] The server integrates the results of preference and emotion analysis and generates personalized travel plans using a generation mechanism. The input is estimated preference and emotion data, and the output is the generated travel plan. Specifically, the server interacts with the AI using prompt messages to select activities and destinations suitable for each user.

[0795] Step 7:

[0796] The server sends the generated travel plan to the terminal. The terminal then displays the received plan to the user. The input is the travel plan data from the server, and the output is the travel plan displayed on the terminal. Specifically, the application receives the data and visualizes the plan on the user interface.

[0797] Step 8:

[0798] After their trip, the user enters feedback through the application, and the device sends this to the server. The input is the user's feedback data, and the output is the feedback transferred to the server. Specifically, the user enters their visited locations and impressions into an input form and presses the submit button.

[0799] Step 9:

[0800] The server analyzes the received feedback using a learning device and stores the learning results in a database to improve the accuracy of future travel plans. The input is the feedback data, and the output is the updated learning database. Specifically, a machine learning algorithm analyzes the feedback and integrates the most relevant data into the model.

[0801] (Application Example 2)

[0802] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0803] Providing personalized viewing content efficiently to diverse users is crucial for general content distribution services. However, conventional systems often rely solely on user preferences for recommendations, failing to consider the user's immediate emotional state. This can lead to decreased user satisfaction and increased churn rates. This invention aims to achieve more suitable content delivery by considering both user preferences and emotions in combination.

[0804] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0805] In this invention, the server includes an image acquisition means for acquiring image information of the user, an estimation means for estimating the user's preferences based on the acquired image information, and an emotion analysis means for analyzing the user's emotional information. This makes it possible to consider the user's preferences and emotions simultaneously and suggest the most suitable viewing content.

[0806] "Image acquisition means" refers to a mechanism for acquiring image information of the user, and involves acquiring the user's clothing and facial expressions using a camera or other photographic device.

[0807] An "estimation tool" is a device that analyzes acquired image information and uses that information to infer the user's preferences.

[0808] "Emotional analysis means" refers to a function for analyzing a user's emotional information, recognizing the user's emotions from image and audio information.

[0809] A "generation method" is a mechanism for suggesting viewing content based on estimated preferences and analyzed emotional information.

[0810] "Presentation means" refers to a function for presenting generated viewing content to the user.

[0811] A "learning tool" is a system that has an algorithm to improve the accuracy of its suggestions based on feedback information received from users.

[0812] A "generative AI model" is an algorithm that takes a prompt as input and generates the necessary information and content, and is a technology used to suggest recommended content.

[0813] A "prompt statement" is a sentence or command given to a generative AI model, used to specify the content to be generated and the direction of the response.

[0814] To implement this invention, a system is needed that collects the user's images and voice and, based on that, suggests the most suitable viewing content for the user. This system mainly consists of a server and terminals.

[0815] First, the device acquires images and audio from the user via its camera and microphone. Image acquisition uses image processing libraries such as OpenCV to extract features of clothing and facial expressions. Audio data is analyzed for emotional state using a deep learning-based emotion recognition model (such as TensorFlow or PyTorch).

[0816] The collected information is sent to a server. Based on this data, the server uses a generative AI model to suggest viewing content based on the user's preferences and immediate emotions. For example, the GPT model is used as the generative AI model, and a list of appropriate content is generated by inputting prompts.

[0817] The generated content is presented to the user on their device. The user reviews the presented content and sends their feedback back to the server via their device. The server analyzes this feedback using learning mechanisms to improve the accuracy of its suggestions.

[0818] As a concrete example, when a user is at home in relaxed clothing on the weekend, the system sends a prompt message to its AI model saying, "Please recommend some relaxing comedy movies." Based on this prompt, a list of comedy movies and dramas that suit the user's state is generated and presented on the device.

[0819] This makes it possible to deliver content that matches users' preferences and emotions, and to provide highly satisfying services to individual users.

[0820] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0821] Step 1:

[0822] The device acquires the user's image and voice. The device captures the user's voice and appearance through the camera and microphone. This input data includes image and audio files.

[0823] Step 2:

[0824] The device uses OpenCV to analyze the acquired images. The image acquisition method extracts features such as color, hue, and clothing style from the image and sends this data to the server. The output is data indicating the user's clothing and appearance.

[0825] Step 3:

[0826] The device passes the audio data to a deep learning model for analysis. The emotion analysis tool analyzes the tone and pitch of the voice to estimate the user's emotional state. The output of this step is an index representing the user's emotions.

[0827] Step 4:

[0828] The server uses the analyzed image data and sentiment data for estimation. This data is input into a generative AI model to generate prompt messages. These prompt messages request viewing content that matches the user's preferences and emotions.

[0829] Step 5:

[0830] The server uses the generated prompt text to run a generative AI model such as GPT. The model generates a list of appropriate viewing content based on the input prompt text. The output of this step is a list of viewing content.

[0831] Step 6:

[0832] The server sends the generated content to the device. The device then presents this list to the user, who selects the content they wish to view.

[0833] Step 7:

[0834] The user selects and watches content from a list of presented options. During this process, the device records the user's reactions as feedback and sends this data to the server.

[0835] Step 8:

[0836] The server uses learning tools to further analyze the received feedback information. This process improves the system's recommendation accuracy. The output of this step is the improved recommendation algorithm.

[0837] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0838] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0839] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0840] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0841] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0842] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0843] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0844] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0845] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0846] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0847] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0848] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0849] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0850] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0851] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0852] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0853] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0854] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0855] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0856] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0857] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0858] The following is further disclosed regarding the embodiments described above.

[0859] (Claim 1)

[0860] An image acquisition means for obtaining user image information,

[0861] An estimation means for estimating user preferences based on acquired image information,

[0862] A generation means for generating travel plans based on estimated preferences,

[0863] A presentation means for presenting the generated travel plan to the user,

[0864] A learning method that learns from feedback information received from users,

[0865] A system that includes this.

[0866] (Claim 2)

[0867] The system according to claim 1, characterized in that the image acquisition means has a function to acquire an image including the user's clothing.

[0868] (Claim 3)

[0869] The system according to claim 1, characterized in that the generation means includes an algorithm that proposes an efficient route based on a starting point and a destination.

[0870] "Example 1"

[0871] (Claim 1)

[0872] Information acquisition means for acquiring the user's visual information,

[0873] An analytical means for analyzing acquired visual information and estimating the user's preferences,

[0874] A generation means that constructs and presents a travel plan based on the analyzed results,

[0875] A learning method that accumulates and analyzes evaluation information provided by users and utilizes it for future proposals,

[0876] A system that includes this.

[0877] (Claim 2)

[0878] The system according to claim 1, characterized in that the information acquisition means has the function of acquiring visual information, including the user's clothing.

[0879] (Claim 3)

[0880] The system according to claim 1, characterized in that the generation means includes a calculation method for proposing an optimal route based on a starting position and a destination position.

[0881] "Application Example 1"

[0882] (Claim 1)

[0883] Means for acquiring user video information,

[0884] A means for estimating user preferences based on acquired video information,

[0885] A means for generating a travel plan based on estimated preferences,

[0886] A means for presenting the generated travel plan to the user,

[0887] A means of learning from the opinion information received from users,

[0888] A means of suggesting the optimal sightseeing route based on the location information of the user,

[0889] A system that includes this.

[0890] (Claim 2)

[0891] The system according to claim 1, characterized in that the video acquisition means has a function to acquire video including the user's clothing.

[0892] (Claim 3)

[0893] The system according to claim 1, characterized in that the generation means includes an algorithm that proposes an optimal tourist route.

[0894] "Example 2 of combining an emotion engine"

[0895] (Claim 1)

[0896] A device that acquires the user's image information,

[0897] A device that estimates user preferences based on acquired image information,

[0898] A device that analyzes the emotional state of a user based on information related to the user,

[0899] A device that generates travel plans based on the user's preferences and emotional state,

[0900] A device for presenting the generated travel plan to the user,

[0901] A device that learns information received from users,

[0902] A system that includes this.

[0903] (Claim 2)

[0904] The system according to claim 1, characterized in that the device has a function to acquire an image including the user's decorations.

[0905] (Claim 3)

[0906] The system according to claim 1, characterized in that the generating device includes a mechanism for proposing an efficient route based on the starting position and the destination position.

[0907] "Application example 2 when combining with an emotional engine"

[0908] (Claim 1)

[0909] An image acquisition means for obtaining user image information,

[0910] An estimation means for estimating user preferences based on acquired image information,

[0911] A means of analyzing user emotional information,

[0912] A generation means that proposes viewing content based on estimated preferences and analyzed emotional information,

[0913] A presentation means for presenting the generated viewing content to the user,

[0914] A learning method that learns from feedback information received from users,

[0915] A system that includes this.

[0916] (Claim 2)

[0917] The system according to claim 1, wherein the image acquisition means has a function to acquire an image including the user's clothing, and further analyzes the user's voice to determine their emotions.

[0918] (Claim 3)

[0919] The system according to claim 1, characterized in that the generation means includes an algorithm that proposes viewing content based on prompt sentences using a generation AI model. [Explanation of symbols]

[0920] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An image acquisition means for obtaining user image information, An estimation means for estimating user preferences based on acquired image information, A generation means for generating travel plans based on estimated preferences, A presentation means for presenting the generated travel plan to the user, A learning method that learns from feedback information received from users, A system that includes this.

2. The system according to claim 1, characterized in that the image acquisition means has a function to acquire an image including the user's clothing.

3. The system according to claim 1, characterized in that the generation means includes an algorithm that proposes an efficient route based on a starting point and a destination.