system
The system addresses the challenge of selecting and trying on special clothing by generating images based on user images, checking inventory, and making reservations, ensuring efficient and convenient clothing selection and fitting.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-09
- Publication Date
- 2026-06-19
Smart Images

Figure 2026100702000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Selecting special clothing such as a kimono or a wedding dress requires a lot of time and effort, and it is particularly burdensome to visit stores to try on clothes. Furthermore, since the inventory status of each store is unknown, it is difficult for users to determine which store to go to. There is a need for a system that supports users in efficiently selecting clothing and trying it on at an optimal store.
Means for Solving the Problems
[0005] The present invention provides a system that includes means for registering a user's image, means for generating images of suitable clothing based on the registered user's image, means for obtaining information on clothing selected by the user from the generated clothing images, means for checking the inventory status of the selected clothing from multiple stores, and means for making a fitting reservation at the most suitable store based on the inventory information. Furthermore, by combining means for providing the generated clothing images to the user in catalog format and means for identifying the most suitable store considering the user's location information and the store's location information, the system enables the user to efficiently choose clothing.
[0006] A "user" is an individual who utilizes the system, providing their own image and carrying out the process from selecting an outfit to making a fitting reservation.
[0007] "Means for registering images" refers to the part of the system that has a data input function for users to upload their own photos and have them processed within the system.
[0008] "Means for generating images of costumes" refers to the part that uses a generation AI model to create virtual images of users wearing various costumes based on their registered images.
[0009] "Means for obtaining selected costume information" refers to the part of the system that records the user's selection of a costume image from the generated images and aggregates the information for use within the system.
[0010] The "means of checking inventory status" refers to the function of contacting multiple stores regarding the selected costume, investigating the availability of each store's stock, and aggregating the information.
[0011] The "means of making a fitting reservation" refers to the function that allows users to complete a fitting reservation at the most suitable store, taking into account their selection and inventory status, within the system.
[0012] "Means of providing in catalog format" refers to the part that has the function of listing multiple generated costume images and displaying them to the user in a visually easy-to-understand manner.
[0013] "Means of identification considering location information" refers to the function that uses the user's location information and the store's geographical information to select the store that is most easily accessible to the user. [Brief explanation of the drawing]
[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when combined with an emotion engine. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when combined with an emotion engine.
Modes for Carrying Out the Invention
[0015] Hereinafter, an example of an embodiment of the system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0016] First, the terms used in the following description will be explained.
[0017] In the following embodiments, a processor with a reference number (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0018] In the following embodiments, a RAM (Random Access Memory) with a reference number is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0019] In the following embodiments, a storage with a reference number is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0022] [First Embodiment]
[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0035] To implement this invention, it is necessary to build a system that allows users to register their own images, select clothing from those images, and efficiently try them on at the most suitable store. A specific embodiment of this system is shown below.
[0036] First, the user uploads a photo of their face to the system using a smartphone or computer. The device retrieves this image and sends it to the server in the appropriate format.
[0037] The server uses the received user image to activate a generation AI model, which generates images simulating various outfits based on the user's face. These images are organized in a catalog format and sent from the server to the user's terminal.
[0038] The user browses this catalog on their device and selects an outfit they are interested in. The device then sends information about the selected outfit to the server.
[0039] Based on the received selection information, the server uses an AI agent to check inventory at multiple stores. It queries the inventory status of each store and analyzes the data to find the best option. This analysis takes into account the user's location information and the store's location information to identify the store that will be most beneficial to the user.
[0040] The server then presents the user with a list of suitable stores and available fitting dates. Once the user selects a store and date, the server uses that information to contact the store to confirm the reservation. After the reservation is complete, the server notifies the user of the fitting details.
[0041] As a concrete example, consider a user preparing for their coming-of-age ceremony using this system. The user registers their photo in the system and browses a catalog of kimonos that are generated. Once the user selects a design they like, the server checks the inventory of nearby rental shops and presents several shop options. The user then chooses one shop and date, and the server automatically completes the reservation necessary for a successful fitting and sale. This system allows users to efficiently experience choosing their attire.
[0042] The following describes the processing flow.
[0043] Step 1:
[0044] The user prepares a photo of their face and uploads it through the photo registration function of the app or web platform. The device receives this image, converts the format as needed, and transfers it to the server.
[0045] Step 2:
[0046] The server inputs the received image data into a generating AI model, which then generates images of the user wearing various outfits based on their face. These outfit images are then organized in a catalog format.
[0047] Step 3:
[0048] The server sends the generated catalog to the user's device. The user then browses the catalog on their device and selects the outfits they are interested in.
[0049] Step 4:
[0050] The device sends the ID and related information of the costume selected by the user to the server. The server receives this information and uses it for the next step.
[0051] Step 5:
[0052] The server checks the inventory of the costume selected by the user across multiple registered stores. This includes querying the inventory database of each store.
[0053] Step 6:
[0054] The server identifies the most suitable store for the user by considering each store's inventory status, the user's location, and the store's geographical information. It then creates a list of the identified stores and presents it to the user, including the available dates for each candidate store.
[0055] Step 7:
[0056] The user selects their preferred store and date from the options provided. The selection information is then sent back to the server.
[0057] Step 8:
[0058] The server processes the request to confirm the fitting appointment at the store based on the user's selection. Once the reservation is confirmed, the server notifies the user of the reservation details.
[0059] This will allow users to efficiently find their desired outfit and experience a smooth fitting reservation process.
[0060] (Example 1)
[0061] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0062] When users select an outfit and schedule a fitting, it is often difficult to efficiently find a store that has the desired design in stock. Furthermore, the distance to the store and scheduling fittings can be cumbersome. An effective system is needed to address these challenges.
[0063] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0064] In this invention, the server includes a device for receiving and recording the user's image, a device for generating an image of the corresponding costume based on the recorded user image, and a device for checking the inventory status of the selected costume from multiple sales facilities. This allows the user to efficiently select their desired costume and make a fitting reservation.
[0065] "User images" refer to visual data that can identify an individual, such as a photograph of the user's face, which is entered into the system.
[0066] A "recording device" refers to electronic equipment that has the function of saving data received in digital format.
[0067] A "device that generates images of corresponding costumes" refers to a device that uses AI technology to create virtual images of costumes based on input data.
[0068] A "device for acquiring selected costume information" refers to a device that reads the data of the costume selected by the user and has the function of managing that information.
[0069] A "device for checking inventory status" refers to a system that acquires inventory data from multiple sales facilities and obtains the latest inventory information.
[0070] "Sales facilities" refers to stores or commercial facilities in general that handle clothing and related products.
[0071] A "device for executing fitting reservations" refers to a device that works in conjunction with the reservation system of a selected sales facility to secure a fitting schedule for the user.
[0072] "Geographic information" refers to data indicating the location of users and facilities, including location coordinates and address information.
[0073] To implement this invention, it is necessary to build a system in which a server, terminal, and user cooperate to efficiently proceed with costume selection through a series of processes. The overall operation will be described in detail here.
[0074] First, the user uses a device such as a smartphone or computer. The user activates the camera function on the device and takes a picture of their face. The captured image is converted to JPEG format on the device. Then, the device sends this to the server using a secure protocol.
[0075] The server processes the received images using existing generative AI models such as "StyleGAN" and "DALL-E". During this process, it prompts the AI models with the instruction, "Generate clothing designs based on the user's image." This process generates composite images of various clothing designs that fit the user's face.
[0076] The generated costume images are organized into a catalog format on the server side. The catalog includes variations of the costumes and is designed with user-friendly display in mind. The catalog data is compressed for efficient transmission and sent from the server to the user's terminal.
[0077] The user browses a catalog received on their device and selects their preferred outfit. This selection information is then sent back to the server. The server uses this information to check the inventory of multiple retail locations. This involves data communication via APIs with each retail location's inventory management system.
[0078] To enhance user convenience, the server obtains the user's location information using the Google® Maps API and other methods, and cross-references it with the location information of the sales facility. This analysis selects the most suitable sales facility for the user. The server then processes the reservation using the fitting reservation system of the selected facility. Finally, detailed information about the fitting is notified to the user.
[0079] As a concrete example, let's consider a user who is preparing for their coming-of-age ceremony. The user uploads a photo of their face to the system and browses the generated catalog images of kimonos. They select a furisode (long-sleeved kimono) they particularly like and use the system to find the best rental facility with available stock. The user specifies their desired date and time, and the server confirms the reservation based on that information. Throughout this entire process, the process of choosing an outfit is made smooth.
[0080] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0081] Step 1:
[0082] The user activates the camera on their smartphone or computer and takes a picture of their face. The captured photo is converted to JPEG format on the device. This converted image data becomes the input information, and the device sends this image to the server using the secure HTTPS protocol.
[0083] Step 2:
[0084] The server receives image data sent from the terminal. To process the received image data, it inputs the prompt message "Generate clothing designs based on the user's image" to the generation AI models, "StyleGAN" and "DALL-E". Through this process, the AI models generate composite images by combining various clothing items based on the user's face image. This generated set of clothing images becomes the output.
[0085] Step 3:
[0086] The server organizes the generated costume images into a catalog format. The organized catalog data is then compressed to reduce file size and prepared for efficient distribution. This is the output of the catalog data. The server then sends this compressed catalog data to the user's terminal.
[0087] Step 4:
[0088] The device decompresses the catalog data received from the server and displays it on the user interface. At this point, the user can browse multiple costume options through swiping and tapping. If the user selects a costume they like from the catalog, this selection becomes input information, which is then resent to the server.
[0089] Step 5:
[0090] The server receives costume selection information from the user and retrieves inventory data from multiple sales facilities via an API connection to the inventory management system. Here, data calculations are performed to check the inventory status of the selected costume, and inventory information for each sales facility is output. Based on this information, the server performs geographical analysis to identify the most suitable sales facility.
[0091] Step 6:
[0092] The server takes into account the user's location and the acquired geographical information of the stores to create a list of the most suitable sales locations and the dates and times when they are available for fitting. The created list is output and sent to the user's device for display.
[0093] Step 7:
[0094] The user selects their preferred facility and fitting date and time from the presented list of stores and available fitting dates and times, and sends this selection information from their terminal to the server. This selection information becomes the input information.
[0095] Step 8:
[0096] The server verifies and confirms the reservation information for the store and date / time selected by the user by linking with the reservation system of the relevant sales facility. This confirmed reservation information becomes the final output, and the server notifies the user. Upon receiving this notification, the user can check the details of the fitting.
[0097] (Application Example 1)
[0098] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0099] Modern users want an environment where they can easily select an appearance that suits them and experience it in a suitable facility. However, there is a lack of efficient systems for finding a suitable appearance and booking a facility to try it out. Therefore, the challenge is to provide a service that allows users to make their ideal choice without wasting time and effort, and to have a comfortable experience.
[0100] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0101] In this invention, the server includes means for registering user characteristics, means for generating a suitable appearance representation based on the registered user characteristics, and means for checking the availability of the selected appearance from multiple facilities. This makes it possible for users to easily select an appearance that suits them and quickly reserve the most suitable facility to try out that appearance.
[0102] "User characteristics" refer to physical or digital attributes unique to each individual user, and are the information necessary for generating visual representations.
[0103] "Means of registration" refers to methods or technologies for incorporating user characteristics into a system as digital data.
[0104] "External representation" refers to visual or other forms of representation generated based on the user's characteristics, and represents the choices offered to the user.
[0105] "Generative means" refers to techniques that use AI or other algorithms to create a visual representation from the user's characteristics.
[0106] "Availability" refers to information about the inventory or availability status of facilities that can provide the experience or goods related to the selected appearance representation.
[0107] "Means of verification" refers to a method or process of obtaining information related to the appearance of multiple facilities and verifying that information.
[0108] "Means of making an experience reservation" refers to a technology or method for securing a date and time for an experience or use at a related facility based on the visual representation selected by the user.
[0109] This system begins with users registering their characteristics on a server using devices such as smartphones or personal computers. The user's device then uses captured or saved image data to send the registered characteristic information to the server in an appropriate format.
[0110] The server uses a generative AI model to create a suitable appearance based on the received user feature information. This generation process utilizes StyleGAN and other AI algorithms to provide the user with visual clothing options. The generated appearances are organized and sent to the user's device in a list format.
[0111] Users can view the available appearance representations on their device and select the one they like. After making a selection, the device sends that information back to the server. The server then initiates communication with multiple facilities to confirm the availability of the selected appearance representation. By analyzing the metadata, the server utilizes the user's current location information and the geographical information of the facilities to identify the most suitable facility.
[0112] Once the most suitable facility is identified, the server makes a reservation for the experience at that facility and notifies the user of the details. This process simplifies the user's ability to easily choose an appearance and costume that suits them and the steps required to actually experience the facility.
[0113] A concrete example would be a user planning to attend a friend's wedding using this system to find a stylish suit. The user registers their face using their smartphone and selects several suits from the catalog provided by the application. The server then automatically suggests the most suitable store and completes the reservation at the user's most convenient date and time.
[0114] An example of a prompt message would be, "Upload a photo of yourself and choose a suit that suits you for your next wedding. The AI will suggest stores where you can try on the suit." By using such a system, users can have an efficient and effective experience in selecting their attire.
[0115] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0116] Step 1:
[0117] Users use devices such as smartphones or computers to acquire images of their own faces and upload those images to the system. Input is either a camera device or pre-stored image data. Output is data sent to the server after the image data has been converted to an appropriate format. This conversion process involves image compression and format conversion.
[0118] Step 2:
[0119] The server receives a facial image sent by the user and uses a generative AI model to generate an appearance representation based on that data. The input is the facial image data, and the output is a set of multiple appearance images provided to the user as candidates. The AI model (e.g., StyleGAN) generates various styles of appearance from the received facial image and organizes them as visual choices. This process involves image generation by the model and subsequent shaping.
[0120] Step 3:
[0121] The server sends the generated appearance images to the user's terminal in catalog format. The input is a set of images generated by AI, and the output is the images arranged in a layout viewable by the user. The server compresses the data and arranges it in the optimal display format.
[0122] Step 4:
[0123] The user browses a catalog on their device and selects an appearance they are interested in. The input is a catalog image, and the output is the selected appearance information. The selected item is confirmed based on the user's interaction.
[0124] Step 5:
[0125] The terminal sends the selected appearance information to the server. The input is the user's selection information, and the output is the transmission of data to the server. The terminal confirms the selected information and sends it in a format that the server can process.
[0126] Step 6:
[0127] The server queries multiple facilities to confirm their ownership status related to the selected appearance. The input is the selected appearance information, and the output is the facilities' ownership data. The server queries facilities via an API and analyzes the retrieved data.
[0128] Step 7:
[0129] The server analyzes the acquired ownership data in combination with the user's location information to identify the most suitable facility. The input is ownership data and location information, and the output is optimal facility information suggested to the user. A data analysis algorithm determines the best option based on travel distance and facility availability conditions.
[0130] Step 8:
[0131] The server makes a reservation for the selected facility and notifies the user of the reservation details. The input is the selected facility information, and the output is reservation confirmation information. The server works in conjunction with the facility's reservation system to confirm the reservation and notifies the user's device.
[0132] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0133] To implement this invention, a system is needed that selects clothing based on the user's image, recognizes the user's emotions using an emotion engine, and selects the most suitable fitting room. Specific embodiments are described below.
[0134] First, the user uploads a photo of their face to the system using their smartphone or computer. The device then sends this image data to the server. The server uses a generative AI model to generate images of the user wearing various outfits based on their face photo, and organizes them in a catalog format.
[0135] Next, the server activates the emotion engine to analyze the user's emotions from their reactions and facial photos. The emotion engine can identify emotions based on the user's facial expression data while they are viewing images, as well as real-time facial capture using the camera. This allows the catalog to reflect recommended outfits based on the user's emotional state while they are browsing.
[0136] The user selects their preferred outfit from a catalog that has been adjusted to take this emotion recognition into account. The selection information is sent from the terminal to the server. The server checks the inventory of the selected outfit at multiple stores. In this process, along with the inventory check results, the server considers the user's emotion data and suggests the most suitable store and reservation date and time.
[0137] As a concrete example, consider a user choosing a wedding dress using this system. Suppose the user uploads a photo of themselves and, while viewing the generated dress catalog, the emotion engine captures emotions such as joy and surprise. Based on these emotions, the system prioritizes presenting the user with dresses of similar styles related to the emotion data. If the user selects a specific dress, the server retrieves inventory information from stores and, taking the emotion data into consideration, makes the most optimal fitting reservation.
[0138] By incorporating this emotion engine into the system, we can provide a more user-friendly and personalized outfit selection experience.
[0139] The following describes the processing flow.
[0140] Step 1:
[0141] The user prepares a photo of their face and uploads it to the system using their device. The image data is processed on the device and sent to the server.
[0142] Step 2:
[0143] The server uses an AI model based on the received image data to generate virtual images of the user wearing various outfits, using the user's face as a basis. These images are organized in a catalog format and sent to the user's device.
[0144] Step 3:
[0145] The user views the received catalog via the terminal. During this time, the terminal uses the user's webcam to capture facial expressions in real time and sends that data to the server.
[0146] Step 4:
[0147] The server analyzes this facial expression data using an emotion engine. The analysis identifies which image the user is reacting to and what emotion (joy, surprise, etc.) they are experiencing. Based on this emotional state, recommended outfits are then prioritized and reflected in the user's catalog.
[0148] Step 5:
[0149] The user selects their favorite outfit from a curated catalog. This selection information is sent from the device to the server and used in the next step.
[0150] Step 6:
[0151] The server queries multiple stores for the availability of the selected outfit. During this process, it also considers the user's sentiment analysis results, prioritizing stores with high recommendation ratings.
[0152] Step 7:
[0153] Based on inventory information and sentiment data, the server identifies the most suitable fitting store and date for the user and sends a list of candidates to the device.
[0154] Step 8:
[0155] The user selects their preferred store and date from the options presented. This selection information is then sent to the server via the device.
[0156] Step 9:
[0157] The server confirms the reservation based on the selected fitting store and date / time. It sends a reservation confirmation notification to the device, informing the user of the fitting reservation details.
[0158] In this way, by incorporating emotion recognition, it becomes possible to select clothing based on the user's emotions and preferences, and to smoothly book fitting appointments.
[0159] (Example 2)
[0160] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0161] Traditional costume selection systems struggled to provide personalized suggestions that reflected user emotions, resulting in poorly suited choices. Furthermore, checking the availability of selected costumes and suggesting reservations were challenging, as these systems often failed to consider user emotions and geographical convenience.
[0162] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0163] In this invention, the server includes means for acquiring a user's image, means for creating images of clothing using a generative AI model, and means for analyzing the user's emotions and making recommendations using an emotion recognition device. This enables clothing selection that reflects the user's emotions, providing a personalized experience. It also enables the suggestion of appropriate fitting reservations based on inventory information and emotion analysis.
[0164] "Means of acquiring user images" refers to a function that collects user facial photographs and image data through the electronic devices used by the user.
[0165] "A means of creating images of costumes using a generative AI model" refers to a function that takes user image data as input and uses a generative artificial intelligence algorithm to generate images of the wearer wearing various costumes.
[0166] "A means of analyzing and recommending user emotions using an emotion recognition device" refers to a function that evaluates the user's facial expressions and reactions using emotion analysis technology and suggests the most suitable outfit based on the user's emotional state.
[0167] "Means for obtaining user-selected costume data" refers to a function that collects specific costume information selected based on the user's preferences.
[0168] "A means of checking inventory information from multiple sales locations" refers to a function that queries the inventory status of selected costumes from multiple partner stores and sales locations via a database.
[0169] The "method for suggesting fitting appointments" is a function that recommends the optimal date, time, and location for trying on clothing, based on the results of user sentiment analysis and inventory status.
[0170] To implement this invention, the user, terminal, and server must each fulfill their respective roles and work together in an integrated manner. First, the user uploads a photograph of their face to the system using a terminal such as a smartphone or personal computer. This terminal requires an internet connection and an application or web browser for transmitting image data.
[0171] The terminal sends the uploaded image data to the server. Upon receiving this data, the server runs a generative AI model. The generative AI model generates clothing images using the user's image as input. This process utilizes GPUs and cloud computing services for high-performance computation. The generative AI model is programmed based on prompts and suggests clothing that matches the user's image.
[0172] Next, the server uses an emotion recognition device to analyze the user's emotions. This device analyzes the user's facial expressions and reactions in real time and generates emotion data. This emotion data is used to optimize the costume catalog the user is browsing, prioritizing the display of costumes that match their emotions, thereby providing the user with a personalized experience.
[0173] As a concrete example, consider a scenario where a user wants to choose a wedding dress. The user uploads a photo of their face to the system, and the server uses a generative AI model to generate various dress images. The prompt message would be, "Analyze the user's image and generate images of styles suitable for a wedding dress." The server then uses an emotion recognition device to analyze the user's reactions and highlights dresses in the catalog that the user expressed joy or interest in.
[0174] In this way, the user sends their selected clothing information to the server via their device, and the server checks the relevant inventory information from multiple stores. Finally, the server suggests a fitting appointment at the most suitable store to the user based on emotional data and inventory information. This system allows users to choose clothing in a rational and emotionally considerate manner.
[0175] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0176] Step 1:
[0177] Users upload their facial photos to the system using their smartphones or computers. The input data is an image file, which the device collects and prepares to send to the server. Specifically, this involves the user clicking an "upload image" button on a dedicated application or web interface and selecting a photo from their file system.
[0178] Step 2:
[0179] The device sends the uploaded image data to the server. This process uses the user's image file as input and transfers the data to the server via the HTTPS protocol as output. Specifically, this means sending the selected image file to the specified API endpoint on the server.
[0180] Step 3:
[0181] The server executes a generative AI model based on the received image data. The input is a photo of the user's face, and the output is images of the user wearing various outfits. This process includes a prompt in the generative AI model that says, "Analyze the user's image and generate an appropriate outfit style." Specifically, the image generation process is performed using the GPU.
[0182] Step 4:
[0183] The server organizes the generated costume images and provides them to the user in catalog format. The input is the costume images output by the generation AI model, and the output is the organized catalog data. The specific operation includes displaying these images as thumbnails in the user interface so that the user can easily select them.
[0184] Step 5:
[0185] The server analyzes the user's emotions using an emotion recognition device. The input data is the user's facial expression data, and the output is data on their emotional state. This process involves capturing the user's facial expressions in real time with a camera and analyzing them using an emotion recognition algorithm.
[0186] Step 6:
[0187] The server reflects recommended outfits in the catalog based on the user's emotional state. Here, the results of the emotional analysis are taken as input, and an output is obtained that adjusts the display order of the catalog. Specifically, it uses the emotional data to display outfits that the user has shown interest in to the front.
[0188] Step 7:
[0189] The user selects an outfit from a catalog that has been adjusted based on emotion recognition. The selected outfit information is collected by the terminal and sent to the server. The input data is the outfit ID selected by the user, and the output is the selection information transferred to the server. The specific action is for the user to confirm the information by clicking on the outfit.
[0190] Step 8:
[0191] The server checks inventory information for the selected costume from multiple sales locations and suggests the best fitting appointment. Input is the selected costume data and sentiment information, and output is the recommended fitting date and time and store information. Specific operations include querying inventory from each store via API and presenting the user with appropriate booking options based on matching results.
[0192] (Application Example 2)
[0193] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0194] Traditional clothing selection systems present the challenge of requiring users to spend time and effort finding the optimal garment from a large number of options. Furthermore, in-store try-on experiences are limited to specific environments, making effective clothing selection difficult. Additionally, the lack of personalized suggestions that consider user emotions contributes to low user satisfaction.
[0195] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0196] In this invention, the server includes means for registering the user's video, means for generating video of suitable clothing based on the registered user's video, and emotion analysis means for analyzing the user's emotions and suggesting the most suitable clothing. This makes it possible to provide personalized clothing suggestions according to the user's emotional state and an optimal trying-on experience.
[0197] "Means for registering user videos" refers to a function that allows users to input their own video information into the system and save it in a database.
[0198] "Means for generating images of suitable clothing" refers to a function that generates and presents visual information of clothing suitable for the user's preferences and characteristics, based on the registered user's video data.
[0199] "An emotional analysis method for analyzing user emotions and suggesting the most suitable clothing" refers to a function that analyzes user emotions from their reactions and facial expressions, and then selects and suggests clothing that is appropriate for the user based on the results.
[0200] "The means of booking a trial session at the optimal location" refers to a function that allows users to book the most suitable trial location and time based on their selection and inventory information.
[0201] "Means for optimizing generated clothing suggestions" refers to a function that analyzes user selections and past behavioral data to select the most suitable clothing from the suggested options.
[0202] This invention is a system that uses a cloud-based server, a user's smartphone or PC, and smart devices installed in physical stores to optimize user clothing selection and trial reservation. The server first registers video data uploaded by the user from their terminal. This data is analyzed by a generating AI model, and videos of clothing suitable for the user are generated. This allows the user to virtually try on various styles of clothing.
[0203] The terminal sends the clothing information selected by the user to the server, which uses a related sentiment analysis engine to evaluate the user's response. Sentiment analysis identifies emotions from the user's facial expressions captured by images and real-time camera footage, and is used to personalize the generated clothing suggestions. Furthermore, the server collects inventory information for the selected clothing from multiple stores and makes the optimal trial reservation.
[0204] As a concrete example, when a user selects a specific piece of clothing, the server checks the inventory status of that clothing at each store and, based on the sentiment analysis results, suggests the most suitable store and reservation time. For instance, by inputting a prompt such as, "This user prefers simple designs, but please suggest a new T-shirt with distinctive sleeves," into the generating AI model, the system can provide suggestions tailored to the user. In this way, users can obtain clothing selection and trial experiences based on their emotions.
[0205] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0206] Step 1:
[0207] Users take a photo of their face using a device such as a smartphone or computer and upload the video data to the system. This input data, the facial photograph, is sent to the server as image data that captures the user's features.
[0208] Step 2:
[0209] The server inputs the received user's facial image into a generation AI model, which then generates images of various clothing items suitable for the user. These generated images are then processed from the user's image data and output as images of the user virtually wearing diverse styles of clothing.
[0210] Step 3:
[0211] The server sends a catalog of generated clothing images to the user's terminal, which the user then browses. The user selects their favorite outfit from the provided catalog. This selection information is then sent back to the server as the user's selection data.
[0212] Step 4:
[0213] The server uses user selection data to run an emotion analysis engine and analyze the user's emotions regarding their choices. This analysis objectively evaluates the user's facial expressions and reactions and performs data calculations to optimize clothing suggestions based on those emotions.
[0214] Step 5:
[0215] The server checks the inventory of the clothing selected by the user across multiple physical stores. The input here is information about the selected clothing, and the server queries the inventory status to collect information on the most suitable store and outputs the results.
[0216] Step 6:
[0217] Based on sentiment analysis results and inventory information, the server suggests the most suitable trial location and reservation date / time for the user. To achieve this, it generates prompt messages that reflect the user's interests (e.g., "This user prefers simple designs, but please suggest new T-shirts with distinctive sleeves") to optimize the trial experience.
[0218] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0219] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0220] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0221] [Second Embodiment]
[0222] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0223] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0224] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0225] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0226] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0227] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0228] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0229] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0230] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0231] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0232] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0233] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0234] To implement this invention, it is necessary to build a system that allows users to register their own images, select clothing from those images, and efficiently try them on at the most suitable store. A specific embodiment of this system is shown below.
[0235] First, the user uploads a photo of their face to the system using a smartphone or computer. The device retrieves this image and sends it to the server in the appropriate format.
[0236] The server uses the received user image to activate a generation AI model, which generates images simulating various outfits based on the user's face. These images are organized in a catalog format and sent from the server to the user's terminal.
[0237] The user browses this catalog on their device and selects an outfit they are interested in. The device then sends information about the selected outfit to the server.
[0238] Based on the received selection information, the server uses an AI agent to check inventory at multiple stores. It queries the inventory status of each store and analyzes the data to find the best option. This analysis takes into account the user's location information and the store's location information to identify the store that will be most beneficial to the user.
[0239] The server then presents the user with a list of suitable stores and available fitting dates. Once the user selects a store and date, the server uses that information to contact the store to confirm the reservation. After the reservation is complete, the server notifies the user of the fitting details.
[0240] As a concrete example, consider a user preparing for their coming-of-age ceremony using this system. The user registers their photo in the system and browses a catalog of kimonos that are generated. Once the user selects a design they like, the server checks the inventory of nearby rental shops and presents several shop options. The user then chooses one shop and date, and the server automatically completes the reservation necessary for a successful fitting and sale. This system allows users to efficiently experience choosing their attire.
[0241] The following describes the processing flow.
[0242] Step 1:
[0243] The user prepares a photo of their face and uploads it through the photo registration function of the app or web platform. The device receives this image, converts the format as needed, and transfers it to the server.
[0244] Step 2:
[0245] The server inputs the received image data into a generating AI model, which then generates images of the user wearing various outfits based on their face. These outfit images are then organized in a catalog format.
[0246] Step 3:
[0247] The server sends the generated catalog to the user's device. The user then browses the catalog on their device and selects the outfits they are interested in.
[0248] Step 4:
[0249] The device sends the ID and related information of the costume selected by the user to the server. The server receives this information and uses it for the next step.
[0250] Step 5:
[0251] The server checks the inventory of the costume selected by the user across multiple registered stores. This includes querying the inventory database of each store.
[0252] Step 6:
[0253] The server identifies the most suitable store for the user by considering each store's inventory status, the user's location, and the store's geographical information. It then creates a list of the identified stores and presents it to the user, including the available dates for each candidate store.
[0254] Step 7:
[0255] The user selects their preferred store and date from the options provided. The selection information is then sent back to the server.
[0256] Step 8:
[0257] The server processes the request to confirm the fitting appointment at the store based on the user's selection. Once the reservation is confirmed, the server notifies the user of the reservation details.
[0258] This will allow users to efficiently find their desired outfit and experience a smooth fitting reservation process.
[0259] (Example 1)
[0260] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0261] When users select an outfit and schedule a fitting, it is often difficult to efficiently find a store that has the desired design in stock. Furthermore, the distance to the store and scheduling fittings can be cumbersome. An effective system is needed to address these challenges.
[0262] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0263] In this invention, the server includes a device for receiving and recording the user's image, a device for generating an image of the corresponding costume based on the recorded user image, and a device for checking the inventory status of the selected costume from multiple sales facilities. This allows the user to efficiently select their desired costume and make a fitting reservation.
[0264] "User images" refer to visual data that can identify an individual, such as a photograph of the user's face, which is entered into the system.
[0265] A "recording device" refers to electronic equipment that has the function of saving data received in digital format.
[0266] A "device that generates images of corresponding costumes" refers to a device that uses AI technology to create virtual images of costumes based on input data.
[0267] A "device for acquiring selected costume information" refers to a device that reads the data of the costume selected by the user and has the function of managing that information.
[0268] A "device for checking inventory status" refers to a system that acquires inventory data from multiple sales facilities and obtains the latest inventory information.
[0269] "Sales facilities" refers to stores or commercial facilities in general that handle clothing and related products.
[0270] A "device for executing fitting reservations" refers to a device that works in conjunction with the reservation system of a selected sales facility to secure a fitting schedule for the user.
[0271] "Geographic information" refers to data indicating the location of users and facilities, including location coordinates and address information.
[0272] To implement this invention, it is necessary to build a system in which a server, terminal, and user cooperate to efficiently proceed with costume selection through a series of processes. The overall operation will be described in detail here.
[0273] First, the user uses a device such as a smartphone or computer. The user activates the camera function on the device and takes a picture of their face. The captured image is converted to JPEG format on the device. Then, the device sends this to the server using a secure protocol.
[0274] The server processes the received images using existing generative AI models such as "StyleGAN" and "DALL-E". During this process, it prompts the AI models with the instruction, "Generate clothing designs based on the user's image." This process generates composite images of various clothing designs that fit the user's face.
[0275] The generated costume images are organized into a catalog format on the server side. The catalog includes variations of the costumes and is designed with user-friendly display in mind. The catalog data is compressed for efficient transmission and sent from the server to the user's terminal.
[0276] The user browses a catalog received on their device and selects their preferred outfit. This selection information is then sent back to the server. The server uses this information to check the inventory of multiple retail locations. This involves data communication via APIs with each retail location's inventory management system.
[0277] To enhance user convenience, the server obtains the user's location information using the Google Maps API and other methods, and cross-references it with the location information of the sales facility. This analysis selects the most suitable sales facility for the user. The server then processes the reservation using the fitting reservation system of the selected facility. Finally, detailed information about the fitting is notified to the user.
[0278] As a concrete example, let's consider a user who is preparing for their coming-of-age ceremony. The user uploads a photo of their face to the system and browses the generated catalog images of kimonos. They select a furisode (long-sleeved kimono) they particularly like and use the system to find the best rental facility with available stock. The user specifies their desired date and time, and the server confirms the reservation based on that information. Throughout this entire process, the process of choosing an outfit is made smooth.
[0279] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0280] Step 1:
[0281] The user activates the camera using a terminal such as a smartphone or a personal computer and takes a photo of their face. The captured photo is converted into the JPEG format on the terminal. This converted image data becomes the input information, and the terminal performs the operation of sending this image to the server via the secure HTTPS protocol.
[0282] Step 2:
[0283] The server receives the image data sent from the terminal. To process the received image data, the prompt sentence "Please generate a clothing design based on the user's image" is input into the generative AI models "StyleGAN" and "DALL-E". Through this process, the AI model generates a synthetic image combining various clothing items based on the user's face image. This generated group of clothing images is the output.
[0284] Step 3:
[0285] The server organizes the generated group of clothing images in a catalog format. The organized catalog data is compressed in terms of file size and prepared for efficient distribution. This is the output of the catalog data. Then, the server sends this compressed catalog data to the user's terminal.
[0286] Step 4:
[0287] The terminal decompresses the catalog data received from the server and displays it on the user interface. At this time, the user can browse through multiple clothing options through swipe or tap operations. If the user selects a clothing item they like from the catalog, that selection information becomes the input information, and this information is resent to the server.
[0288] Step 5:
[0289] The server receives costume selection information from the user and retrieves inventory data from multiple sales facilities via an API connection to the inventory management system. Here, data calculations are performed to check the inventory status of the selected costume, and inventory information for each sales facility is output. Based on this information, the server performs geographical analysis to identify the most suitable sales facility.
[0290] Step 6:
[0291] The server takes into account the user's location and the acquired geographical information of the stores to create a list of the most suitable sales locations and the dates and times when they are available for fitting. The created list is output and sent to the user's device for display.
[0292] Step 7:
[0293] The user selects their preferred facility and fitting date and time from the presented list of stores and available fitting dates and times, and sends this selection information from their terminal to the server. This selection information becomes the input information.
[0294] Step 8:
[0295] The server verifies and confirms the reservation information for the store and date / time selected by the user by linking with the reservation system of the relevant sales facility. This confirmed reservation information becomes the final output, and the server notifies the user. Upon receiving this notification, the user can check the details of the fitting.
[0296] (Application Example 1)
[0297] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0298] Modern users want an environment where they can easily select an appearance that suits them and experience it in a suitable facility. However, there is a lack of efficient systems for finding a suitable appearance and booking a facility to try it out. Therefore, the challenge is to provide a service that allows users to make their ideal choice without wasting time and effort, and to have a comfortable experience.
[0299] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0300] In this invention, the server includes means for registering user characteristics, means for generating a suitable appearance representation based on the registered user characteristics, and means for checking the availability of the selected appearance from multiple facilities. This makes it possible for users to easily select an appearance that suits them and quickly reserve the most suitable facility to try out that appearance.
[0301] "User characteristics" refer to physical or digital attributes unique to each individual user, and are the information necessary for generating visual representations.
[0302] "Means of registration" refers to methods or technologies for incorporating user characteristics into a system as digital data.
[0303] "External representation" refers to visual or other forms of representation generated based on the user's characteristics, and represents the choices offered to the user.
[0304] "Generative means" refers to techniques that use AI or other algorithms to create a visual representation from the user's characteristics.
[0305] "Availability" refers to information about the inventory or availability status of facilities that can provide the experience or goods related to the selected appearance representation.
[0306] "Means of verification" refers to a method or process of obtaining information related to the appearance of multiple facilities and verifying that information.
[0307] The "means for making an experience reservation" is a technology or method for securing the date and time of an experience or use at a related facility for the appearance expression selected by the user.
[0308] This system begins with the user registering their characteristics with the server using a device such as a smartphone or a personal computer. The user's terminal uses the captured or saved image data to transmit the registered characteristic information to the server in an appropriate format.
[0309] Based on the received characteristic information of the user, the server creates a matching appearance expression by utilizing a generative AI model. In this generation process, StyleGAN or other AI algorithms are used to provide the user with visual clothing options. The generated appearance expressions are organized and transmitted to the user's terminal in a list format.
[0310] The user can view the appearance expressions provided on the terminal and select the ones they like. After the selection is made, the terminal returns the information to the server. The server starts communicating with multiple facilities to check the availability status of the selected appearance expression. By analyzing the metadata, the server utilizes the user's current location information and the geographical information of the facilities to identify the most suitable facility.
[0311] Once the optimal facility is identified, the server makes an experience reservation at that facility and notifies the user of the detailed information. Through this process, the user can easily select an appearance and clothing that suits them and simplifies the procedures for actually experiencing it.
[0312] As a specific example, a user who plans to attend a friend's wedding uses this system to look for a fashionable suit. The user registers their face using a smartphone and selects several from the suit catalog provided by the application. Then, the server automatically proposes the most suitable store and completes the reservation at the most convenient time for the user.
[0313] An example of a prompt message would be, "Upload a photo of yourself and choose a suit that suits you for your next wedding. The AI will suggest stores where you can try on the suit." By using such a system, users can have an efficient and effective experience in selecting their attire.
[0314] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0315] Step 1:
[0316] Users use devices such as smartphones or computers to acquire images of their own faces and upload those images to the system. Input is either a camera device or pre-stored image data. Output is data sent to the server after the image data has been converted to an appropriate format. This conversion process involves image compression and format conversion.
[0317] Step 2:
[0318] The server receives a facial image sent by the user and uses a generative AI model to generate an appearance representation based on that data. The input is the facial image data, and the output is a set of multiple appearance images provided to the user as candidates. The AI model (e.g., StyleGAN) generates various styles of appearance from the received facial image and organizes them as visual choices. This process involves image generation by the model and subsequent shaping.
[0319] Step 3:
[0320] The server sends the generated appearance images to the user's terminal in catalog format. The input is a set of images generated by AI, and the output is the images arranged in a layout viewable by the user. The server compresses the data and arranges it in the optimal display format.
[0321] Step 4:
[0322] The user browses a catalog on their device and selects an appearance they are interested in. The input is a catalog image, and the output is the selected appearance information. The selected item is confirmed based on the user's interaction.
[0323] Step 5:
[0324] The terminal sends the selected appearance information to the server. The input is the user's selection information, and the output is the transmission of data to the server. The terminal confirms the selected information and sends it in a format that the server can process.
[0325] Step 6:
[0326] The server queries multiple facilities to confirm their ownership status related to the selected appearance. The input is the selected appearance information, and the output is the facilities' ownership data. The server queries facilities via an API and analyzes the retrieved data.
[0327] Step 7:
[0328] The server analyzes the acquired ownership data in combination with the user's location information to identify the most suitable facility. The input is ownership data and location information, and the output is optimal facility information suggested to the user. A data analysis algorithm determines the best option based on travel distance and facility availability conditions.
[0329] Step 8:
[0330] The server makes a reservation for the selected facility and notifies the user of the reservation details. The input is the selected facility information, and the output is reservation confirmation information. The server works in conjunction with the facility's reservation system to confirm the reservation and notifies the user's device.
[0331] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0332] To implement this invention, a system is needed that selects clothing based on the user's image, recognizes the user's emotions using an emotion engine, and selects the most suitable fitting room. Specific embodiments are described below.
[0333] First, the user uploads a photo of their face to the system using their smartphone or computer. The device then sends this image data to the server. The server uses a generative AI model to generate images of the user wearing various outfits based on their face photo, and organizes them in a catalog format.
[0334] Next, the server activates the emotion engine to analyze the user's emotions from their reactions and facial photos. The emotion engine can identify emotions based on the user's facial expression data while they are viewing images, as well as real-time facial capture using the camera. This allows the catalog to reflect recommended outfits based on the user's emotional state while they are browsing.
[0335] The user selects their preferred outfit from a catalog that has been adjusted to take this emotion recognition into account. The selection information is sent from the terminal to the server. The server checks the inventory of the selected outfit at multiple stores. In this process, along with the inventory check results, the server considers the user's emotion data and suggests the most suitable store and reservation date and time.
[0336] As a concrete example, consider a user choosing a wedding dress using this system. Suppose the user uploads a photo of themselves and, while viewing the generated dress catalog, the emotion engine captures emotions such as joy and surprise. Based on these emotions, the system prioritizes presenting the user with dresses of similar styles related to the emotion data. If the user selects a specific dress, the server retrieves inventory information from stores and, taking the emotion data into consideration, makes the most optimal fitting reservation.
[0337] By incorporating this emotion engine into the system, we can provide a more user-friendly and personalized outfit selection experience.
[0338] The following describes the processing flow.
[0339] Step 1:
[0340] The user prepares a photo of their face and uploads it to the system using their device. The image data is processed on the device and sent to the server.
[0341] Step 2:
[0342] The server uses an AI model based on the received image data to generate virtual images of the user wearing various outfits, using the user's face as a basis. These images are organized in a catalog format and sent to the user's device.
[0343] Step 3:
[0344] The user views the received catalog via the terminal. During this time, the terminal uses the user's webcam to capture facial expressions in real time and sends that data to the server.
[0345] Step 4:
[0346] The server analyzes this facial expression data using an emotion engine. The analysis identifies which image the user is reacting to and what emotion (joy, surprise, etc.) they are experiencing. Based on this emotional state, recommended outfits are then prioritized and reflected in the user's catalog.
[0347] Step 5:
[0348] The user selects their favorite outfit from a curated catalog. This selection information is sent from the device to the server and used in the next step.
[0349] Step 6:
[0350] The server queries multiple stores for the availability of the selected outfit. During this process, it also considers the user's sentiment analysis results, prioritizing stores with high recommendation ratings.
[0351] Step 7:
[0352] Based on inventory information and sentiment data, the server identifies the most suitable fitting store and date for the user and sends a list of candidates to the device.
[0353] Step 8:
[0354] The user selects their preferred store and date from the options presented. This selection information is then sent to the server via the device.
[0355] Step 9:
[0356] The server confirms the reservation based on the selected fitting store and date / time. It sends a reservation confirmation notification to the device, informing the user of the fitting reservation details.
[0357] In this way, by incorporating emotion recognition, it becomes possible to select clothing based on the user's emotions and preferences, and to smoothly book fitting appointments.
[0358] (Example 2)
[0359] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0360] Traditional costume selection systems struggled to provide personalized suggestions that reflected user emotions, resulting in poorly suited choices. Furthermore, checking the availability of selected costumes and suggesting reservations were challenging, as these systems often failed to consider user emotions and geographical convenience.
[0361] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0362] In this invention, the server includes means for acquiring a user's image, means for creating images of clothing using a generative AI model, and means for analyzing the user's emotions and making recommendations using an emotion recognition device. This enables clothing selection that reflects the user's emotions, providing a personalized experience. It also enables the suggestion of appropriate fitting reservations based on inventory information and emotion analysis.
[0363] "Means of acquiring user images" refers to a function that collects user facial photographs and image data through the electronic devices used by the user.
[0364] "A means of creating images of costumes using a generative AI model" refers to a function that takes user image data as input and uses a generative artificial intelligence algorithm to generate images of the wearer wearing various costumes.
[0365] "A means of analyzing and recommending user emotions using an emotion recognition device" refers to a function that evaluates the user's facial expressions and reactions using emotion analysis technology and suggests the most suitable outfit based on the user's emotional state.
[0366] "Means for obtaining user-selected costume data" refers to a function that collects specific costume information selected based on the user's preferences.
[0367] "A means of checking inventory information from multiple sales locations" refers to a function that queries the inventory status of selected costumes from multiple partner stores and sales locations via a database.
[0368] The "method for suggesting fitting appointments" is a function that recommends the optimal date, time, and location for trying on clothing, based on the results of user sentiment analysis and inventory status.
[0369] To implement this invention, the user, terminal, and server must each fulfill their respective roles and work together in an integrated manner. First, the user uploads a photograph of their face to the system using a terminal such as a smartphone or personal computer. This terminal requires an internet connection and an application or web browser for transmitting image data.
[0370] The terminal sends the uploaded image data to the server. Upon receiving this data, the server runs a generative AI model. The generative AI model generates clothing images using the user's image as input. This process utilizes GPUs and cloud computing services for high-performance computation. The generative AI model is programmed based on prompts and suggests clothing that matches the user's image.
[0371] Next, the server uses an emotion recognition device to analyze the user's emotions. This device analyzes the user's facial expressions and reactions in real time and generates emotion data. This emotion data is used to optimize the costume catalog the user is browsing, prioritizing the display of costumes that match their emotions, thereby providing the user with a personalized experience.
[0372] As a concrete example, consider a scenario where a user wants to choose a wedding dress. The user uploads a photo of their face to the system, and the server uses a generative AI model to generate various dress images. The prompt message would be, "Analyze the user's image and generate images of styles suitable for a wedding dress." The server then uses an emotion recognition device to analyze the user's reactions and highlights dresses in the catalog that the user expressed joy or interest in.
[0373] In this way, the user sends their selected clothing information to the server via their device, and the server checks the relevant inventory information from multiple stores. Finally, the server suggests a fitting appointment at the most suitable store to the user based on emotional data and inventory information. This system allows users to choose clothing in a rational and emotionally considerate manner.
[0374] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0375] Step 1:
[0376] Users upload their facial photos to the system using their smartphones or computers. The input data is an image file, which the device collects and prepares to send to the server. Specifically, this involves the user clicking an "upload image" button on a dedicated application or web interface and selecting a photo from their file system.
[0377] Step 2:
[0378] The device sends the uploaded image data to the server. This process uses the user's image file as input and transfers the data to the server via the HTTPS protocol as output. Specifically, this means sending the selected image file to the specified API endpoint on the server.
[0379] Step 3:
[0380] The server executes a generative AI model based on the received image data. The input is a photo of the user's face, and the output is images of the user wearing various outfits. This process includes a prompt in the generative AI model that says, "Analyze the user's image and generate an appropriate outfit style." Specifically, the image generation process is performed using the GPU.
[0381] Step 4:
[0382] The server organizes the generated costume images and provides them to the user in catalog format. The input is the costume images output by the generation AI model, and the output is the organized catalog data. The specific operation includes displaying these images as thumbnails in the user interface so that the user can easily select them.
[0383] Step 5:
[0384] The server analyzes the user's emotions using an emotion recognition device. The input data is the user's facial expression data, and the output is data on their emotional state. This process involves capturing the user's facial expressions in real time with a camera and analyzing them using an emotion recognition algorithm.
[0385] Step 6:
[0386] The server reflects recommended outfits in the catalog based on the user's emotional state. Here, the results of the emotional analysis are taken as input, and an output is obtained that adjusts the display order of the catalog. Specifically, it uses the emotional data to display outfits that the user has shown interest in to the front.
[0387] Step 7:
[0388] The user selects an outfit from a catalog that has been adjusted based on emotion recognition. The selected outfit information is collected by the terminal and sent to the server. The input data is the outfit ID selected by the user, and the output is the selection information transferred to the server. The specific action is for the user to confirm the information by clicking on the outfit.
[0389] Step 8:
[0390] The server checks inventory information for the selected costume from multiple sales locations and suggests the best fitting appointment. Input is the selected costume data and sentiment information, and output is the recommended fitting date and time and store information. Specific operations include querying inventory from each store via API and presenting the user with appropriate booking options based on matching results.
[0391] (Application Example 2)
[0392] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0393] Traditional clothing selection systems present the challenge of requiring users to spend time and effort finding the optimal garment from a large number of options. Furthermore, in-store try-on experiences are limited to specific environments, making effective clothing selection difficult. Additionally, the lack of personalized suggestions that consider user emotions contributes to low user satisfaction.
[0394] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0395] In this invention, the server includes means for registering the user's video, means for generating video of suitable clothing based on the registered user's video, and emotion analysis means for analyzing the user's emotions and suggesting the most suitable clothing. This makes it possible to provide personalized clothing suggestions according to the user's emotional state and an optimal trying-on experience.
[0396] "Means for registering user videos" refers to a function that allows users to input their own video information into the system and save it in a database.
[0397] "Means for generating images of suitable clothing" refers to a function that generates and presents visual information of clothing suitable for the user's preferences and characteristics, based on the registered user's video data.
[0398] "An emotional analysis method for analyzing user emotions and suggesting the most suitable clothing" refers to a function that analyzes user emotions from their reactions and facial expressions, and then selects and suggests clothing that is appropriate for the user based on the results.
[0399] "The means of booking a trial session at the optimal location" refers to a function that allows users to book the most suitable trial location and time based on their selection and inventory information.
[0400] "Means for optimizing generated clothing suggestions" refers to a function that analyzes user selections and past behavioral data to select the most suitable clothing from the suggested options.
[0401] This invention is a system that uses a cloud-based server, a user's smartphone or PC, and smart devices installed in physical stores to optimize user clothing selection and trial reservation. The server first registers video data uploaded by the user from their terminal. This data is analyzed by a generating AI model, and videos of clothing suitable for the user are generated. This allows the user to virtually try on various styles of clothing.
[0402] The terminal sends the clothing information selected by the user to the server, which uses a related sentiment analysis engine to evaluate the user's response. Sentiment analysis identifies emotions from the user's facial expressions captured by images and real-time camera footage, and is used to personalize the generated clothing suggestions. Furthermore, the server collects inventory information for the selected clothing from multiple stores and makes the optimal trial reservation.
[0403] As a concrete example, when a user selects a specific piece of clothing, the server checks the inventory status of that clothing at each store and, based on the sentiment analysis results, suggests the most suitable store and reservation time. For instance, by inputting a prompt such as, "This user prefers simple designs, but please suggest a new T-shirt with distinctive sleeves," into the generating AI model, the system can provide suggestions tailored to the user. In this way, users can obtain clothing selection and trial experiences based on their emotions.
[0404] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0405] Step 1:
[0406] Users take a photo of their face using a device such as a smartphone or computer and upload the video data to the system. This input data, the facial photograph, is sent to the server as image data that captures the user's features.
[0407] Step 2:
[0408] The server inputs the received user's facial image into a generation AI model, which then generates images of various clothing items suitable for the user. These generated images are then processed from the user's image data and output as images of the user virtually wearing diverse styles of clothing.
[0409] Step 3:
[0410] The server sends a catalog of generated clothing images to the user's terminal, which the user then browses. The user selects their favorite outfit from the provided catalog. This selection information is then sent back to the server as the user's selection data.
[0411] Step 4:
[0412] The server uses user selection data to run an emotion analysis engine and analyze the user's emotions regarding their choices. This analysis objectively evaluates the user's facial expressions and reactions and performs data calculations to optimize clothing suggestions based on those emotions.
[0413] Step 5:
[0414] The server checks the inventory of the clothing selected by the user across multiple physical stores. The input here is information about the selected clothing, and the server queries the inventory status to collect information on the most suitable store and outputs the results.
[0415] Step 6:
[0416] Based on sentiment analysis results and inventory information, the server suggests the most suitable trial location and reservation date / time for the user. To achieve this, it generates prompt messages that reflect the user's interests (e.g., "This user prefers simple designs, but please suggest new T-shirts with distinctive sleeves") to optimize the trial experience.
[0417] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0418] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0419] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0420] [Third Embodiment]
[0421] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0422] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0423] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0424] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0425] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0426] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0427] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0428] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0429] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0430] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0431] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0432] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0433] To implement this invention, it is necessary to build a system that allows users to register their own images, select clothing from those images, and efficiently try them on at the most suitable store. A specific embodiment of this system is shown below.
[0434] First, the user uploads a photo of their face to the system using a smartphone or computer. The device retrieves this image and sends it to the server in the appropriate format.
[0435] The server uses the received user image to activate a generation AI model, which generates images simulating various outfits based on the user's face. These images are organized in a catalog format and sent from the server to the user's terminal.
[0436] The user browses this catalog on their device and selects an outfit they are interested in. The device then sends information about the selected outfit to the server.
[0437] Based on the received selection information, the server uses an AI agent to check inventory at multiple stores. It queries the inventory status of each store and analyzes the data to find the best option. This analysis takes into account the user's location information and the store's location information to identify the store that will be most beneficial to the user.
[0438] The server then presents the user with a list of suitable stores and available fitting dates. Once the user selects a store and date, the server uses that information to contact the store to confirm the reservation. After the reservation is complete, the server notifies the user of the fitting details.
[0439] As a concrete example, consider a user preparing for their coming-of-age ceremony using this system. The user registers their photo in the system and browses a catalog of kimonos that are generated. Once the user selects a design they like, the server checks the inventory of nearby rental shops and presents several shop options. The user then chooses one shop and date, and the server automatically completes the reservation necessary for a successful fitting and sale. This system allows users to efficiently experience choosing their attire.
[0440] The following describes the processing flow.
[0441] Step 1:
[0442] The user prepares a photo of their face and uploads it through the photo registration function of the app or web platform. The device receives this image, converts the format as needed, and transfers it to the server.
[0443] Step 2:
[0444] The server inputs the received image data into a generating AI model, which then generates images of the user wearing various outfits based on their face. These outfit images are then organized in a catalog format.
[0445] Step 3:
[0446] The server sends the generated catalog to the user's device. The user then browses the catalog on their device and selects the outfits they are interested in.
[0447] Step 4:
[0448] The device sends the ID and related information of the costume selected by the user to the server. The server receives this information and uses it for the next step.
[0449] Step 5:
[0450] The server checks the inventory of the costume selected by the user across multiple registered stores. This includes querying the inventory database of each store.
[0451] Step 6:
[0452] The server identifies the most suitable store for the user by considering each store's inventory status, the user's location, and the store's geographical information. It then creates a list of the identified stores and presents it to the user, including the available dates for each candidate store.
[0453] Step 7:
[0454] The user selects their preferred store and date from the options provided. The selection information is then sent back to the server.
[0455] Step 8:
[0456] The server processes the request to confirm the fitting appointment at the store based on the user's selection. Once the reservation is confirmed, the server notifies the user of the reservation details.
[0457] This will allow users to efficiently find their desired outfit and experience a smooth fitting reservation process.
[0458] (Example 1)
[0459] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0460] When users select an outfit and schedule a fitting, it is often difficult to efficiently find a store that has the desired design in stock. Furthermore, the distance to the store and scheduling fittings can be cumbersome. An effective system is needed to address these challenges.
[0461] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0462] In this invention, the server includes a device for receiving and recording the user's image, a device for generating an image of the corresponding costume based on the recorded user image, and a device for checking the inventory status of the selected costume from multiple sales facilities. This allows the user to efficiently select their desired costume and make a fitting reservation.
[0463] "User images" refer to visual data that can identify an individual, such as a photograph of the user's face, which is entered into the system.
[0464] A "recording device" refers to electronic equipment that has the function of saving data received in digital format.
[0465] A "device that generates images of corresponding costumes" refers to a device that uses AI technology to create virtual images of costumes based on input data.
[0466] A "device for acquiring selected costume information" refers to a device that reads the data of the costume selected by the user and has the function of managing that information.
[0467] A "device for checking inventory status" refers to a system that acquires inventory data from multiple sales facilities and obtains the latest inventory information.
[0468] "Sales facilities" refers to stores or commercial facilities in general that handle clothing and related products.
[0469] A "device for executing fitting reservations" refers to a device that works in conjunction with the reservation system of a selected sales facility to secure a fitting schedule for the user.
[0470] "Geographic information" refers to data indicating the location of users and facilities, including location coordinates and address information.
[0471] To implement this invention, it is necessary to build a system in which a server, terminal, and user cooperate to efficiently proceed with costume selection through a series of processes. The overall operation will be described in detail here.
[0472] First, the user uses a device such as a smartphone or computer. The user activates the camera function on the device and takes a picture of their face. The captured image is converted to JPEG format on the device. Then, the device sends this to the server using a secure protocol.
[0473] The server processes the received images using existing generative AI models such as "StyleGAN" and "DALL-E". During this process, it prompts the AI models with the instruction, "Generate clothing designs based on the user's image." This process generates composite images of various clothing designs that fit the user's face.
[0474] The generated costume images are organized into a catalog format on the server side. The catalog includes variations of the costumes and is designed with user-friendly display in mind. The catalog data is compressed for efficient transmission and sent from the server to the user's terminal.
[0475] The user browses a catalog received on their device and selects their preferred outfit. This selection information is then sent back to the server. The server uses this information to check the inventory of multiple retail locations. This involves data communication via APIs with each retail location's inventory management system.
[0476] To enhance user convenience, the server obtains the user's location information using the Google Maps API and other methods, and cross-references it with the location information of the sales facility. This analysis selects the most suitable sales facility for the user. The server then processes the reservation using the fitting reservation system of the selected facility. Finally, detailed information about the fitting is notified to the user.
[0477] As a concrete example, let's consider a user who is preparing for their coming-of-age ceremony. The user uploads a photo of their face to the system and browses the generated catalog images of kimonos. They select a furisode (long-sleeved kimono) they particularly like and use the system to find the best rental facility with available stock. The user specifies their desired date and time, and the server confirms the reservation based on that information. Throughout this entire process, the process of choosing an outfit is made smooth.
[0478] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0479] Step 1:
[0480] The user activates the camera on their smartphone or computer and takes a picture of their face. The captured photo is converted to JPEG format on the device. This converted image data becomes the input information, and the device sends this image to the server using the secure HTTPS protocol.
[0481] Step 2:
[0482] The server receives image data sent from the terminal. To process the received image data, it inputs the prompt message "Generate clothing designs based on the user's image" to the generation AI models, "StyleGAN" and "DALL-E". Through this process, the AI models generate composite images by combining various clothing items based on the user's face image. This generated set of clothing images becomes the output.
[0483] Step 3:
[0484] The server organizes the generated costume images into a catalog format. The organized catalog data is then compressed to reduce file size and prepared for efficient distribution. This is the output of the catalog data. The server then sends this compressed catalog data to the user's terminal.
[0485] Step 4:
[0486] The device decompresses the catalog data received from the server and displays it on the user interface. At this point, the user can browse multiple costume options through swiping and tapping. If the user selects a costume they like from the catalog, this selection becomes input information, which is then resent to the server.
[0487] Step 5:
[0488] The server receives costume selection information from the user and retrieves inventory data from multiple sales facilities via an API connection to the inventory management system. Here, data calculations are performed to check the inventory status of the selected costume, and inventory information for each sales facility is output. Based on this information, the server performs geographical analysis to identify the most suitable sales facility.
[0489] Step 6:
[0490] The server takes into account the user's location and the acquired geographical information of the stores to create a list of the most suitable sales locations and the dates and times when they are available for fitting. The created list is output and sent to the user's device for display.
[0491] Step 7:
[0492] The user selects their preferred facility and fitting date and time from the presented list of stores and available fitting dates and times, and sends this selection information from their terminal to the server. This selection information becomes the input information.
[0493] Step 8:
[0494] The server verifies and confirms the reservation information for the store and date / time selected by the user by linking with the reservation system of the relevant sales facility. This confirmed reservation information becomes the final output, and the server notifies the user. Upon receiving this notification, the user can check the details of the fitting.
[0495] (Application Example 1)
[0496] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0497] Modern users want an environment where they can easily select an appearance that suits them and experience it in a suitable facility. However, there is a lack of efficient systems for finding a suitable appearance and booking a facility to try it out. Therefore, the challenge is to provide a service that allows users to make their ideal choice without wasting time and effort, and to have a comfortable experience.
[0498] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0499] In this invention, the server includes means for registering user characteristics, means for generating a suitable appearance representation based on the registered user characteristics, and means for checking the availability of the selected appearance from multiple facilities. This makes it possible for users to easily select an appearance that suits them and quickly reserve the most suitable facility to try out that appearance.
[0500] "User characteristics" refer to physical or digital attributes unique to each individual user, and are the information necessary for generating visual representations.
[0501] "Means of registration" refers to methods or technologies for incorporating user characteristics into a system as digital data.
[0502] "External representation" refers to visual or other forms of representation generated based on the user's characteristics, and represents the choices offered to the user.
[0503] "Generative means" refers to techniques that use AI or other algorithms to create a visual representation from the user's characteristics.
[0504] "Availability" refers to information about the inventory or availability status of facilities that can provide the experience or goods related to the selected appearance representation.
[0505] "Means of verification" refers to a method or process of obtaining information related to the appearance of multiple facilities and verifying that information.
[0506] "Means of making an experience reservation" refers to a technology or method for securing a date and time for an experience or use at a related facility based on the visual representation selected by the user.
[0507] This system begins with users registering their characteristics on a server using devices such as smartphones or personal computers. The user's device then uses captured or saved image data to send the registered characteristic information to the server in an appropriate format.
[0508] The server uses a generative AI model to create a suitable appearance based on the received user feature information. This generation process utilizes StyleGAN and other AI algorithms to provide the user with visual clothing options. The generated appearances are organized and sent to the user's device in a list format.
[0509] Users can view the available appearance representations on their device and select the one they like. After making a selection, the device sends that information back to the server. The server then initiates communication with multiple facilities to confirm the availability of the selected appearance representation. By analyzing the metadata, the server utilizes the user's current location information and the geographical information of the facilities to identify the most suitable facility.
[0510] Once the most suitable facility is identified, the server makes a reservation for the experience at that facility and notifies the user of the details. This process simplifies the user's ability to easily choose an appearance and costume that suits them and the steps required to actually experience the facility.
[0511] A concrete example would be a user planning to attend a friend's wedding using this system to find a stylish suit. The user registers their face using their smartphone and selects several suits from the catalog provided by the application. The server then automatically suggests the most suitable store and completes the reservation at the user's most convenient date and time.
[0512] An example of a prompt message would be, "Upload a photo of yourself and choose a suit that suits you for your next wedding. The AI will suggest stores where you can try on the suit." By using such a system, users can have an efficient and effective experience in selecting their attire.
[0513] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0514] Step 1:
[0515] Users use devices such as smartphones or computers to acquire images of their own faces and upload those images to the system. Input is either a camera device or pre-stored image data. Output is data sent to the server after the image data has been converted to an appropriate format. This conversion process involves image compression and format conversion.
[0516] Step 2:
[0517] The server receives a facial image sent by the user and uses a generative AI model to generate an appearance representation based on that data. The input is the facial image data, and the output is a set of multiple appearance images provided to the user as candidates. The AI model (e.g., StyleGAN) generates various styles of appearance from the received facial image and organizes them as visual choices. This process involves image generation by the model and subsequent shaping.
[0518] Step 3:
[0519] The server sends the generated appearance images to the user's terminal in catalog format. The input is a set of images generated by AI, and the output is the images arranged in a layout viewable by the user. The server compresses the data and arranges it in the optimal display format.
[0520] Step 4:
[0521] The user browses a catalog on their device and selects an appearance they are interested in. The input is a catalog image, and the output is the selected appearance information. The selected item is confirmed based on the user's interaction.
[0522] Step 5:
[0523] The terminal sends the selected appearance information to the server. The input is the user's selection information, and the output is the transmission of data to the server. The terminal confirms the selected information and sends it in a format that the server can process.
[0524] Step 6:
[0525] The server queries multiple facilities to confirm their ownership status related to the selected appearance. The input is the selected appearance information, and the output is the facilities' ownership data. The server queries facilities via an API and analyzes the retrieved data.
[0526] Step 7:
[0527] The server analyzes the acquired ownership data in combination with the user's location information to identify the most suitable facility. The input is ownership data and location information, and the output is optimal facility information suggested to the user. A data analysis algorithm determines the best option based on travel distance and facility availability conditions.
[0528] Step 8:
[0529] The server makes a reservation for the selected facility and notifies the user of the reservation details. The input is the selected facility information, and the output is reservation confirmation information. The server works in conjunction with the facility's reservation system to confirm the reservation and notifies the user's device.
[0530] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0531] To implement this invention, a system is needed that selects clothing based on the user's image, recognizes the user's emotions using an emotion engine, and selects the most suitable fitting room. Specific embodiments are described below.
[0532] First, the user uploads a photo of their face to the system using their smartphone or computer. The device then sends this image data to the server. The server uses a generative AI model to generate images of the user wearing various outfits based on their face photo, and organizes them in a catalog format.
[0533] Next, the server activates the emotion engine to analyze the user's emotions from their reactions and facial photos. The emotion engine can identify emotions based on the user's facial expression data while they are viewing images, as well as real-time facial capture using the camera. This allows the catalog to reflect recommended outfits based on the user's emotional state while they are browsing.
[0534] The user selects their preferred outfit from a catalog that has been adjusted to take this emotion recognition into account. The selection information is sent from the terminal to the server. The server checks the inventory of the selected outfit at multiple stores. In this process, along with the inventory check results, the server considers the user's emotion data and suggests the most suitable store and reservation date and time.
[0535] As a concrete example, consider a user choosing a wedding dress using this system. Suppose the user uploads a photo of themselves and, while viewing the generated dress catalog, the emotion engine captures emotions such as joy and surprise. Based on these emotions, the system prioritizes presenting the user with dresses of similar styles related to the emotion data. If the user selects a specific dress, the server retrieves inventory information from stores and, taking the emotion data into consideration, makes the most optimal fitting reservation.
[0536] By incorporating this emotion engine into the system, we can provide a more user-friendly and personalized outfit selection experience.
[0537] The following describes the processing flow.
[0538] Step 1:
[0539] The user prepares a photo of their face and uploads it to the system using their device. The image data is processed on the device and sent to the server.
[0540] Step 2:
[0541] The server uses an AI model based on the received image data to generate virtual images of the user wearing various outfits, using the user's face as a basis. These images are organized in a catalog format and sent to the user's device.
[0542] Step 3:
[0543] The user views the received catalog via the terminal. During this time, the terminal uses the user's webcam to capture facial expressions in real time and sends that data to the server.
[0544] Step 4:
[0545] The server analyzes this facial expression data using an emotion engine. The analysis identifies which image the user is reacting to and what emotion (joy, surprise, etc.) they are experiencing. Based on this emotional state, recommended outfits are then prioritized and reflected in the user's catalog.
[0546] Step 5:
[0547] The user selects their favorite outfit from a curated catalog. This selection information is sent from the device to the server and used in the next step.
[0548] Step 6:
[0549] The server queries multiple stores for the availability of the selected outfit. During this process, it also considers the user's sentiment analysis results, prioritizing stores with high recommendation ratings.
[0550] Step 7:
[0551] Based on inventory information and sentiment data, the server identifies the most suitable fitting store and date for the user and sends a list of candidates to the device.
[0552] Step 8:
[0553] The user selects their preferred store and date from the options presented. This selection information is then sent to the server via the device.
[0554] Step 9:
[0555] The server confirms the reservation based on the selected fitting store and date / time. It sends a reservation confirmation notification to the device, informing the user of the fitting reservation details.
[0556] In this way, by incorporating emotion recognition, it becomes possible to select clothing based on the user's emotions and preferences, and to smoothly book fitting appointments.
[0557] (Example 2)
[0558] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0559] Traditional costume selection systems struggled to provide personalized suggestions that reflected user emotions, resulting in poorly suited choices. Furthermore, checking the availability of selected costumes and suggesting reservations were challenging, as these systems often failed to consider user emotions and geographical convenience.
[0560] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0561] In this invention, the server includes means for acquiring a user's image, means for creating images of clothing using a generative AI model, and means for analyzing the user's emotions and making recommendations using an emotion recognition device. This enables clothing selection that reflects the user's emotions, providing a personalized experience. It also enables the suggestion of appropriate fitting reservations based on inventory information and emotion analysis.
[0562] "Means of acquiring user images" refers to a function that collects user facial photographs and image data through the electronic devices used by the user.
[0563] "A means of creating images of costumes using a generative AI model" refers to a function that takes user image data as input and uses a generative artificial intelligence algorithm to generate images of the wearer wearing various costumes.
[0564] "A means of analyzing and recommending user emotions using an emotion recognition device" refers to a function that evaluates the user's facial expressions and reactions using emotion analysis technology and suggests the most suitable outfit based on the user's emotional state.
[0565] "Means for obtaining user-selected costume data" refers to a function that collects specific costume information selected based on the user's preferences.
[0566] "A means of checking inventory information from multiple sales locations" refers to a function that queries the inventory status of selected costumes from multiple partner stores and sales locations via a database.
[0567] The "method for suggesting fitting appointments" is a function that recommends the optimal date, time, and location for trying on clothing, based on the results of user sentiment analysis and inventory status.
[0568] To implement this invention, the user, terminal, and server must each fulfill their respective roles and work together in an integrated manner. First, the user uploads a photograph of their face to the system using a terminal such as a smartphone or personal computer. This terminal requires an internet connection and an application or web browser for transmitting image data.
[0569] The terminal sends the uploaded image data to the server. Upon receiving this data, the server runs a generative AI model. The generative AI model generates clothing images using the user's image as input. This process utilizes GPUs and cloud computing services for high-performance computation. The generative AI model is programmed based on prompts and suggests clothing that matches the user's image.
[0570] Next, the server uses an emotion recognition device to analyze the user's emotions. This device analyzes the user's facial expressions and reactions in real time and generates emotion data. This emotion data is used to optimize the costume catalog the user is browsing, prioritizing the display of costumes that match their emotions, thereby providing the user with a personalized experience.
[0571] As a concrete example, consider a scenario where a user wants to choose a wedding dress. The user uploads a photo of their face to the system, and the server uses a generative AI model to generate various dress images. The prompt message would be, "Analyze the user's image and generate images of styles suitable for a wedding dress." The server then uses an emotion recognition device to analyze the user's reactions and highlights dresses in the catalog that the user expressed joy or interest in.
[0572] In this way, the user sends their selected clothing information to the server via their device, and the server checks the relevant inventory information from multiple stores. Finally, the server suggests a fitting appointment at the most suitable store to the user based on emotional data and inventory information. This system allows users to choose clothing in a rational and emotionally considerate manner.
[0573] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0574] Step 1:
[0575] Users upload their facial photos to the system using their smartphones or computers. The input data is an image file, which the device collects and prepares to send to the server. Specifically, this involves the user clicking an "upload image" button on a dedicated application or web interface and selecting a photo from their file system.
[0576] Step 2:
[0577] The device sends the uploaded image data to the server. This process uses the user's image file as input and transfers the data to the server via the HTTPS protocol as output. Specifically, this means sending the selected image file to the specified API endpoint on the server.
[0578] Step 3:
[0579] The server executes a generative AI model based on the received image data. The input is a photo of the user's face, and the output is images of the user wearing various outfits. This process includes a prompt in the generative AI model that says, "Analyze the user's image and generate an appropriate outfit style." Specifically, the image generation process is performed using the GPU.
[0580] Step 4:
[0581] The server organizes the generated costume images and provides them to the user in catalog format. The input is the costume images output by the generation AI model, and the output is the organized catalog data. The specific operation includes displaying these images as thumbnails in the user interface so that the user can easily select them.
[0582] Step 5:
[0583] The server analyzes the user's emotions using an emotion recognition device. The input data is the user's facial expression data, and the output is data on their emotional state. This process involves capturing the user's facial expressions in real time with a camera and analyzing them using an emotion recognition algorithm.
[0584] Step 6:
[0585] The server reflects recommended outfits in the catalog based on the user's emotional state. Here, the results of the emotional analysis are taken as input, and an output is obtained that adjusts the display order of the catalog. Specifically, it uses the emotional data to display outfits that the user has shown interest in to the front.
[0586] Step 7:
[0587] The user selects an outfit from a catalog that has been adjusted based on emotion recognition. The selected outfit information is collected by the terminal and sent to the server. The input data is the outfit ID selected by the user, and the output is the selection information transferred to the server. The specific action is for the user to confirm the information by clicking on the outfit.
[0588] Step 8:
[0589] The server checks inventory information for the selected costume from multiple sales locations and suggests the best fitting appointment. Input is the selected costume data and sentiment information, and output is the recommended fitting date and time and store information. Specific operations include querying inventory from each store via API and presenting the user with appropriate booking options based on matching results.
[0590] (Application Example 2)
[0591] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0592] Traditional clothing selection systems present the challenge of requiring users to spend time and effort finding the optimal garment from a large number of options. Furthermore, in-store try-on experiences are limited to specific environments, making effective clothing selection difficult. Additionally, the lack of personalized suggestions that consider user emotions contributes to low user satisfaction.
[0593] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0594] In this invention, the server includes means for registering the user's video, means for generating video of suitable clothing based on the registered user's video, and emotion analysis means for analyzing the user's emotions and suggesting the most suitable clothing. This makes it possible to provide personalized clothing suggestions according to the user's emotional state and an optimal trying-on experience.
[0595] "Means for registering user videos" refers to a function that allows users to input their own video information into the system and save it in a database.
[0596] "Means for generating images of suitable clothing" refers to a function that generates and presents visual information of clothing suitable for the user's preferences and characteristics, based on the registered user's video data.
[0597] "An emotional analysis method for analyzing user emotions and suggesting the most suitable clothing" refers to a function that analyzes user emotions from their reactions and facial expressions, and then selects and suggests clothing that is appropriate for the user based on the results.
[0598] "The means of booking a trial session at the optimal location" refers to a function that allows users to book the most suitable trial location and time based on their selection and inventory information.
[0599] "Means for optimizing generated clothing suggestions" refers to a function that analyzes user selections and past behavioral data to select the most suitable clothing from the suggested options.
[0600] This invention is a system that uses a cloud-based server, a user's smartphone or PC, and smart devices installed in physical stores to optimize user clothing selection and trial reservation. The server first registers video data uploaded by the user from their terminal. This data is analyzed by a generating AI model, and videos of clothing suitable for the user are generated. This allows the user to virtually try on various styles of clothing.
[0601] The terminal sends the clothing information selected by the user to the server, which uses a related sentiment analysis engine to evaluate the user's response. Sentiment analysis identifies emotions from the user's facial expressions captured by images and real-time camera footage, and is used to personalize the generated clothing suggestions. Furthermore, the server collects inventory information for the selected clothing from multiple stores and makes the optimal trial reservation.
[0602] As a concrete example, when a user selects a specific piece of clothing, the server checks the inventory status of that clothing at each store and, based on the sentiment analysis results, suggests the most suitable store and reservation time. For instance, by inputting a prompt such as, "This user prefers simple designs, but please suggest a new T-shirt with distinctive sleeves," into the generating AI model, the system can provide suggestions tailored to the user. In this way, users can obtain clothing selection and trial experiences based on their emotions.
[0603] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0604] Step 1:
[0605] Users take a photo of their face using a device such as a smartphone or computer and upload the video data to the system. This input data, the facial photograph, is sent to the server as image data that captures the user's features.
[0606] Step 2:
[0607] The server inputs the received user's facial image into a generation AI model, which then generates images of various clothing items suitable for the user. These generated images are then processed from the user's image data and output as images of the user virtually wearing diverse styles of clothing.
[0608] Step 3:
[0609] The server sends a catalog of generated clothing images to the user's terminal, which the user then browses. The user selects their favorite outfit from the provided catalog. This selection information is then sent back to the server as the user's selection data.
[0610] Step 4:
[0611] The server uses user selection data to run an emotion analysis engine and analyze the user's emotions regarding their choices. This analysis objectively evaluates the user's facial expressions and reactions and performs data calculations to optimize clothing suggestions based on those emotions.
[0612] Step 5:
[0613] The server checks the inventory of the clothing selected by the user across multiple physical stores. The input here is information about the selected clothing, and the server queries the inventory status to collect information on the most suitable store and outputs the results.
[0614] Step 6:
[0615] Based on sentiment analysis results and inventory information, the server suggests the most suitable trial location and reservation date / time for the user. To achieve this, it generates prompt messages that reflect the user's interests (e.g., "This user prefers simple designs, but please suggest new T-shirts with distinctive sleeves") to optimize the trial experience.
[0616] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0617] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0618] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0619] [Fourth Embodiment]
[0620] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0621] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0622] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0623] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0624] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0625] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0626] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0627] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0628] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0629] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0630] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0631] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0632] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0633] To implement this invention, it is necessary to build a system that allows users to register their own images, select clothing from those images, and efficiently try them on at the most suitable store. A specific embodiment of this system is shown below.
[0634] First, the user uploads a photo of their face to the system using a smartphone or computer. The device retrieves this image and sends it to the server in the appropriate format.
[0635] The server uses the received user image to activate a generation AI model, which generates images simulating various outfits based on the user's face. These images are organized in a catalog format and sent from the server to the user's terminal.
[0636] The user browses this catalog on their device and selects an outfit they are interested in. The device then sends information about the selected outfit to the server.
[0637] Based on the received selection information, the server uses an AI agent to check inventory at multiple stores. It queries the inventory status of each store and analyzes the data to find the best option. This analysis takes into account the user's location information and the store's location information to identify the store that will be most beneficial to the user.
[0638] The server then presents the user with a list of suitable stores and available fitting dates. Once the user selects a store and date, the server uses that information to contact the store to confirm the reservation. After the reservation is complete, the server notifies the user of the fitting details.
[0639] As a concrete example, consider a user preparing for their coming-of-age ceremony using this system. The user registers their photo in the system and browses a catalog of kimonos that are generated. Once the user selects a design they like, the server checks the inventory of nearby rental shops and presents several shop options. The user then chooses one shop and date, and the server automatically completes the reservation necessary for a successful fitting and sale. This system allows users to efficiently experience choosing their attire.
[0640] The following describes the processing flow.
[0641] Step 1:
[0642] The user prepares a photo of their face and uploads it through the photo registration function of the app or web platform. The device receives this image, converts the format as needed, and transfers it to the server.
[0643] Step 2:
[0644] The server inputs the received image data into a generating AI model, which then generates images of the user wearing various outfits based on their face. These outfit images are then organized in a catalog format.
[0645] Step 3:
[0646] The server sends the generated catalog to the user's device. The user then browses the catalog on their device and selects the outfits they are interested in.
[0647] Step 4:
[0648] The device sends the ID and related information of the costume selected by the user to the server. The server receives this information and uses it for the next step.
[0649] Step 5:
[0650] The server checks the inventory of the costume selected by the user across multiple registered stores. This includes querying the inventory database of each store.
[0651] Step 6:
[0652] The server identifies the most suitable store for the user by considering each store's inventory status, the user's location, and the store's geographical information. It then creates a list of the identified stores and presents it to the user, including the available dates for each candidate store.
[0653] Step 7:
[0654] The user selects their preferred store and date from the options provided. The selection information is then sent back to the server.
[0655] Step 8:
[0656] The server processes the request to confirm the fitting appointment at the store based on the user's selection. Once the reservation is confirmed, the server notifies the user of the reservation details.
[0657] This will allow users to efficiently find their desired outfit and experience a smooth fitting reservation process.
[0658] (Example 1)
[0659] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0660] When users select an outfit and schedule a fitting, it is often difficult to efficiently find a store that has the desired design in stock. Furthermore, the distance to the store and scheduling fittings can be cumbersome. An effective system is needed to address these challenges.
[0661] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0662] In this invention, the server includes a device for receiving and recording the user's image, a device for generating an image of the corresponding costume based on the recorded user image, and a device for checking the inventory status of the selected costume from multiple sales facilities. This allows the user to efficiently select their desired costume and make a fitting reservation.
[0663] "User images" refer to visual data that can identify an individual, such as a photograph of the user's face, which is entered into the system.
[0664] A "recording device" refers to electronic equipment that has the function of saving data received in digital format.
[0665] A "device that generates images of corresponding costumes" refers to a device that uses AI technology to create virtual images of costumes based on input data.
[0666] A "device for acquiring selected costume information" refers to a device that reads the data of the costume selected by the user and has the function of managing that information.
[0667] A "device for checking inventory status" refers to a system that acquires inventory data from multiple sales facilities and obtains the latest inventory information.
[0668] "Sales facilities" refers to stores or commercial facilities in general that handle clothing and related products.
[0669] A "device for executing fitting reservations" refers to a device that works in conjunction with the reservation system of a selected sales facility to secure a fitting schedule for the user.
[0670] "Geographic information" refers to data indicating the location of users and facilities, including location coordinates and address information.
[0671] To implement this invention, it is necessary to build a system in which a server, terminal, and user cooperate to efficiently proceed with costume selection through a series of processes. The overall operation will be described in detail here.
[0672] First, the user uses a device such as a smartphone or computer. The user activates the camera function on the device and takes a picture of their face. The captured image is converted to JPEG format on the device. Then, the device sends this to the server using a secure protocol.
[0673] The server processes the received images using existing generative AI models such as "StyleGAN" and "DALL-E". During this process, it prompts the AI models with the instruction, "Generate clothing designs based on the user's image." This process generates composite images of various clothing designs that fit the user's face.
[0674] The generated costume images are organized into a catalog format on the server side. The catalog includes variations of the costumes and is designed with user-friendly display in mind. The catalog data is compressed for efficient transmission and sent from the server to the user's terminal.
[0675] The user browses a catalog received on their device and selects their preferred outfit. This selection information is then sent back to the server. The server uses this information to check the inventory of multiple retail locations. This involves data communication via APIs with each retail location's inventory management system.
[0676] To enhance user convenience, the server obtains the user's location information using the Google Maps API and other methods, and cross-references it with the location information of the sales facility. This analysis selects the most suitable sales facility for the user. The server then processes the reservation using the fitting reservation system of the selected facility. Finally, detailed information about the fitting is notified to the user.
[0677] As a concrete example, let's consider a user who is preparing for their coming-of-age ceremony. The user uploads a photo of their face to the system and browses the generated catalog images of kimonos. They select a furisode (long-sleeved kimono) they particularly like and use the system to find the best rental facility with available stock. The user specifies their desired date and time, and the server confirms the reservation based on that information. Throughout this entire process, the process of choosing an outfit is made smooth.
[0678] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0679] Step 1:
[0680] The user activates the camera on their smartphone or computer and takes a picture of their face. The captured photo is converted to JPEG format on the device. This converted image data becomes the input information, and the device sends this image to the server using the secure HTTPS protocol.
[0681] Step 2:
[0682] The server receives image data sent from the terminal. To process the received image data, it inputs the prompt message "Generate clothing designs based on the user's image" to the generation AI models, "StyleGAN" and "DALL-E". Through this process, the AI models generate composite images by combining various clothing items based on the user's face image. This generated set of clothing images becomes the output.
[0683] Step 3:
[0684] The server organizes the generated costume images into a catalog format. The organized catalog data is then compressed to reduce file size and prepared for efficient distribution. This is the output of the catalog data. The server then sends this compressed catalog data to the user's terminal.
[0685] Step 4:
[0686] The device decompresses the catalog data received from the server and displays it on the user interface. At this point, the user can browse multiple costume options through swiping and tapping. If the user selects a costume they like from the catalog, this selection becomes input information, which is then resent to the server.
[0687] Step 5:
[0688] The server receives costume selection information from the user and retrieves inventory data from multiple sales facilities via an API connection to the inventory management system. Here, data calculations are performed to check the inventory status of the selected costume, and inventory information for each sales facility is output. Based on this information, the server performs geographical analysis to identify the most suitable sales facility.
[0689] Step 6:
[0690] The server takes into account the user's location and the acquired geographical information of the stores to create a list of the most suitable sales locations and the dates and times when they are available for fitting. The created list is output and sent to the user's device for display.
[0691] Step 7:
[0692] The user selects their preferred facility and fitting date and time from the presented list of stores and available fitting dates and times, and sends this selection information from their terminal to the server. This selection information becomes the input information.
[0693] Step 8:
[0694] The server verifies and confirms the reservation information for the store and date / time selected by the user by linking with the reservation system of the relevant sales facility. This confirmed reservation information becomes the final output, and the server notifies the user. Upon receiving this notification, the user can check the details of the fitting.
[0695] (Application Example 1)
[0696] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0697] Modern users want an environment where they can easily select an appearance that suits them and experience it in a suitable facility. However, there is a lack of efficient systems for finding a suitable appearance and booking a facility to try it out. Therefore, the challenge is to provide a service that allows users to make their ideal choice without wasting time and effort, and to have a comfortable experience.
[0698] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0699] In this invention, the server includes means for registering user characteristics, means for generating a suitable appearance representation based on the registered user characteristics, and means for checking the availability of the selected appearance from multiple facilities. This makes it possible for users to easily select an appearance that suits them and quickly reserve the most suitable facility to try out that appearance.
[0700] "User characteristics" refer to physical or digital attributes unique to each individual user, and are the information necessary for generating visual representations.
[0701] "Means of registration" refers to methods or technologies for incorporating user characteristics into a system as digital data.
[0702] "External representation" refers to visual or other forms of representation generated based on the user's characteristics, and represents the choices offered to the user.
[0703] "Generative means" refers to techniques that use AI or other algorithms to create a visual representation from the user's characteristics.
[0704] "Availability" refers to information about the inventory or availability status of facilities that can provide the experience or goods related to the selected appearance representation.
[0705] "Means of verification" refers to a method or process of obtaining information related to the appearance of multiple facilities and verifying that information.
[0706] "Means of making an experience reservation" refers to a technology or method for securing a date and time for an experience or use at a related facility based on the visual representation selected by the user.
[0707] This system begins with users registering their characteristics on a server using devices such as smartphones or personal computers. The user's device then uses captured or saved image data to send the registered characteristic information to the server in an appropriate format.
[0708] The server uses a generative AI model to create a suitable appearance based on the received user feature information. This generation process utilizes StyleGAN and other AI algorithms to provide the user with visual clothing options. The generated appearances are organized and sent to the user's device in a list format.
[0709] Users can view the available appearance representations on their device and select the one they like. After making a selection, the device sends that information back to the server. The server then initiates communication with multiple facilities to confirm the availability of the selected appearance representation. By analyzing the metadata, the server utilizes the user's current location information and the geographical information of the facilities to identify the most suitable facility.
[0710] Once the most suitable facility is identified, the server makes a reservation for the experience at that facility and notifies the user of the details. This process simplifies the user's ability to easily choose an appearance and costume that suits them and the steps required to actually experience the facility.
[0711] A concrete example would be a user planning to attend a friend's wedding using this system to find a stylish suit. The user registers their face using their smartphone and selects several suits from the catalog provided by the application. The server then automatically suggests the most suitable store and completes the reservation at the user's most convenient date and time.
[0712] An example of a prompt message would be, "Upload a photo of yourself and choose a suit that suits you for your next wedding. The AI will suggest stores where you can try on the suit." By using such a system, users can have an efficient and effective experience in selecting their attire.
[0713] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0714] Step 1:
[0715] Users use devices such as smartphones or computers to acquire images of their own faces and upload those images to the system. Input is either a camera device or pre-stored image data. Output is data sent to the server after the image data has been converted to an appropriate format. This conversion process involves image compression and format conversion.
[0716] Step 2:
[0717] The server receives a facial image sent by the user and uses a generative AI model to generate an appearance representation based on that data. The input is the facial image data, and the output is a set of multiple appearance images provided to the user as candidates. The AI model (e.g., StyleGAN) generates various styles of appearance from the received facial image and organizes them as visual choices. This process involves image generation by the model and subsequent shaping.
[0718] Step 3:
[0719] The server sends the generated appearance images to the user's terminal in catalog format. The input is a set of images generated by AI, and the output is the images arranged in a layout viewable by the user. The server compresses the data and arranges it in the optimal display format.
[0720] Step 4:
[0721] The user browses a catalog on their device and selects an appearance they are interested in. The input is a catalog image, and the output is the selected appearance information. The selected item is confirmed based on the user's interaction.
[0722] Step 5:
[0723] The terminal sends the selected appearance information to the server. The input is the user's selection information, and the output is the transmission of data to the server. The terminal confirms the selected information and sends it in a format that the server can process.
[0724] Step 6:
[0725] The server queries multiple facilities to confirm their ownership status related to the selected appearance. The input is the selected appearance information, and the output is the facilities' ownership data. The server queries facilities via an API and analyzes the retrieved data.
[0726] Step 7:
[0727] The server analyzes the acquired ownership data in combination with the user's location information to identify the most suitable facility. The input is ownership data and location information, and the output is optimal facility information suggested to the user. A data analysis algorithm determines the best option based on travel distance and facility availability conditions.
[0728] Step 8:
[0729] The server makes a reservation for the selected facility and notifies the user of the reservation details. The input is the selected facility information, and the output is reservation confirmation information. The server works in conjunction with the facility's reservation system to confirm the reservation and notifies the user's device.
[0730] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0731] To implement this invention, a system is needed that selects clothing based on the user's image, recognizes the user's emotions using an emotion engine, and selects the most suitable fitting room. Specific embodiments are described below.
[0732] First, the user uploads a photo of their face to the system using their smartphone or computer. The device then sends this image data to the server. The server uses a generative AI model to generate images of the user wearing various outfits based on their face photo, and organizes them in a catalog format.
[0733] Next, the server activates the emotion engine to analyze the user's emotions from their reactions and facial photos. The emotion engine can identify emotions based on the user's facial expression data while they are viewing images, as well as real-time facial capture using the camera. This allows the catalog to reflect recommended outfits based on the user's emotional state while they are browsing.
[0734] The user selects their preferred outfit from a catalog that has been adjusted to take this emotion recognition into account. The selection information is sent from the terminal to the server. The server checks the inventory of the selected outfit at multiple stores. In this process, along with the inventory check results, the server considers the user's emotion data and suggests the most suitable store and reservation date and time.
[0735] As a concrete example, consider a user choosing a wedding dress using this system. Suppose the user uploads a photo of themselves and, while viewing the generated dress catalog, the emotion engine captures emotions such as joy and surprise. Based on these emotions, the system prioritizes presenting the user with dresses of similar styles related to the emotion data. If the user selects a specific dress, the server retrieves inventory information from stores and, taking the emotion data into consideration, makes the most optimal fitting reservation.
[0736] By incorporating this emotion engine into the system, we can provide a more user-friendly and personalized outfit selection experience.
[0737] The following describes the processing flow.
[0738] Step 1:
[0739] The user prepares a photo of their face and uploads it to the system using their device. The image data is processed on the device and sent to the server.
[0740] Step 2:
[0741] The server uses an AI model based on the received image data to generate virtual images of the user wearing various outfits, using the user's face as a basis. These images are organized in a catalog format and sent to the user's device.
[0742] Step 3:
[0743] The user views the received catalog via the terminal. During this time, the terminal uses the user's webcam to capture facial expressions in real time and sends that data to the server.
[0744] Step 4:
[0745] The server analyzes this facial expression data using an emotion engine. The analysis identifies which image the user is reacting to and what emotion (joy, surprise, etc.) they are experiencing. Based on this emotional state, recommended outfits are then prioritized and reflected in the user's catalog.
[0746] Step 5:
[0747] The user selects their favorite outfit from a curated catalog. This selection information is sent from the device to the server and used in the next step.
[0748] Step 6:
[0749] The server queries multiple stores for the availability of the selected outfit. During this process, it also considers the user's sentiment analysis results, prioritizing stores with high recommendation ratings.
[0750] Step 7:
[0751] Based on inventory information and sentiment data, the server identifies the most suitable fitting store and date for the user and sends a list of candidates to the device.
[0752] Step 8:
[0753] The user selects their preferred store and date from the options presented. This selection information is then sent to the server via the device.
[0754] Step 9:
[0755] The server confirms the reservation based on the selected fitting store and date / time. It sends a reservation confirmation notification to the device, informing the user of the fitting reservation details.
[0756] In this way, by incorporating emotion recognition, it becomes possible to select clothing based on the user's emotions and preferences, and to smoothly book fitting appointments.
[0757] (Example 2)
[0758] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0759] Traditional costume selection systems struggled to provide personalized suggestions that reflected user emotions, resulting in poorly suited choices. Furthermore, checking the availability of selected costumes and suggesting reservations were challenging, as these systems often failed to consider user emotions and geographical convenience.
[0760] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0761] In this invention, the server includes means for acquiring a user's image, means for creating images of clothing using a generative AI model, and means for analyzing the user's emotions and making recommendations using an emotion recognition device. This enables clothing selection that reflects the user's emotions, providing a personalized experience. It also enables the suggestion of appropriate fitting reservations based on inventory information and emotion analysis.
[0762] "Means of acquiring user images" refers to a function that collects user facial photographs and image data through the electronic devices used by the user.
[0763] "A means of creating images of costumes using a generative AI model" refers to a function that takes user image data as input and uses a generative artificial intelligence algorithm to generate images of the wearer wearing various costumes.
[0764] "A means of analyzing and recommending user emotions using an emotion recognition device" refers to a function that evaluates the user's facial expressions and reactions using emotion analysis technology and suggests the most suitable outfit based on the user's emotional state.
[0765] "Means for obtaining user-selected costume data" refers to a function that collects specific costume information selected based on the user's preferences.
[0766] "A means of checking inventory information from multiple sales locations" refers to a function that queries the inventory status of selected costumes from multiple partner stores and sales locations via a database.
[0767] The "method for suggesting fitting appointments" is a function that recommends the optimal date, time, and location for trying on clothing, based on the results of user sentiment analysis and inventory status.
[0768] To implement this invention, the user, terminal, and server must each fulfill their respective roles and work together in an integrated manner. First, the user uploads a photograph of their face to the system using a terminal such as a smartphone or personal computer. This terminal requires an internet connection and an application or web browser for transmitting image data.
[0769] The terminal sends the uploaded image data to the server. Upon receiving this data, the server runs a generative AI model. The generative AI model generates clothing images using the user's image as input. This process utilizes GPUs and cloud computing services for high-performance computation. The generative AI model is programmed based on prompts and suggests clothing that matches the user's image.
[0770] Next, the server uses an emotion recognition device to analyze the user's emotions. This device analyzes the user's facial expressions and reactions in real time and generates emotion data. This emotion data is used to optimize the costume catalog the user is browsing, prioritizing the display of costumes that match their emotions, thereby providing the user with a personalized experience.
[0771] As a concrete example, consider a scenario where a user wants to choose a wedding dress. The user uploads a photo of their face to the system, and the server uses a generative AI model to generate various dress images. The prompt message would be, "Analyze the user's image and generate images of styles suitable for a wedding dress." The server then uses an emotion recognition device to analyze the user's reactions and highlights dresses in the catalog that the user expressed joy or interest in.
[0772] In this way, the user sends their selected clothing information to the server via their device, and the server checks the relevant inventory information from multiple stores. Finally, the server suggests a fitting appointment at the most suitable store to the user based on emotional data and inventory information. This system allows users to choose clothing in a rational and emotionally considerate manner.
[0773] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0774] Step 1:
[0775] Users upload their facial photos to the system using their smartphones or computers. The input data is an image file, which the device collects and prepares to send to the server. Specifically, this involves the user clicking an "upload image" button on a dedicated application or web interface and selecting a photo from their file system.
[0776] Step 2:
[0777] The device sends the uploaded image data to the server. This process uses the user's image file as input and transfers the data to the server via the HTTPS protocol as output. Specifically, this means sending the selected image file to the specified API endpoint on the server.
[0778] Step 3:
[0779] The server executes a generative AI model based on the received image data. The input is a photo of the user's face, and the output is images of the user wearing various outfits. This process includes a prompt in the generative AI model that says, "Analyze the user's image and generate an appropriate outfit style." Specifically, the image generation process is performed using the GPU.
[0780] Step 4:
[0781] The server organizes the generated costume images and provides them to the user in catalog format. The input is the costume images output by the generation AI model, and the output is the organized catalog data. The specific operation includes displaying these images as thumbnails in the user interface so that the user can easily select them.
[0782] Step 5:
[0783] The server analyzes the user's emotions using an emotion recognition device. The input data is the user's facial expression data, and the output is data on their emotional state. This process involves capturing the user's facial expressions in real time with a camera and analyzing them using an emotion recognition algorithm.
[0784] Step 6:
[0785] The server reflects recommended outfits in the catalog based on the user's emotional state. Here, the results of the emotional analysis are taken as input, and an output is obtained that adjusts the display order of the catalog. Specifically, it uses the emotional data to display outfits that the user has shown interest in to the front.
[0786] Step 7:
[0787] The user selects an outfit from a catalog that has been adjusted based on emotion recognition. The selected outfit information is collected by the terminal and sent to the server. The input data is the outfit ID selected by the user, and the output is the selection information transferred to the server. The specific action is for the user to confirm the information by clicking on the outfit.
[0788] Step 8:
[0789] The server checks inventory information for the selected costume from multiple sales locations and suggests the best fitting appointment. Input is the selected costume data and sentiment information, and output is the recommended fitting date and time and store information. Specific operations include querying inventory from each store via API and presenting the user with appropriate booking options based on matching results.
[0790] (Application Example 2)
[0791] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0792] Traditional clothing selection systems present the challenge of requiring users to spend time and effort finding the optimal garment from a large number of options. Furthermore, in-store try-on experiences are limited to specific environments, making effective clothing selection difficult. Additionally, the lack of personalized suggestions that consider user emotions contributes to low user satisfaction.
[0793] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0794] In this invention, the server includes means for registering the user's video, means for generating video of suitable clothing based on the registered user's video, and emotion analysis means for analyzing the user's emotions and suggesting the most suitable clothing. This makes it possible to provide personalized clothing suggestions according to the user's emotional state and an optimal trying-on experience.
[0795] "Means for registering user videos" refers to a function that allows users to input their own video information into the system and save it in a database.
[0796] "Means for generating images of suitable clothing" refers to a function that generates and presents visual information of clothing suitable for the user's preferences and characteristics, based on the registered user's video data.
[0797] "An emotional analysis method for analyzing user emotions and suggesting the most suitable clothing" refers to a function that analyzes user emotions from their reactions and facial expressions, and then selects and suggests clothing that is appropriate for the user based on the results.
[0798] "The means of booking a trial session at the optimal location" refers to a function that allows users to book the most suitable trial location and time based on their selection and inventory information.
[0799] "Means for optimizing generated clothing suggestions" refers to a function that analyzes user selections and past behavioral data to select the most suitable clothing from the suggested options.
[0800] This invention is a system that uses a cloud-based server, a user's smartphone or PC, and smart devices installed in physical stores to optimize user clothing selection and trial reservation. The server first registers video data uploaded by the user from their terminal. This data is analyzed by a generating AI model, and videos of clothing suitable for the user are generated. This allows the user to virtually try on various styles of clothing.
[0801] The terminal sends the clothing information selected by the user to the server, which uses a related sentiment analysis engine to evaluate the user's response. Sentiment analysis identifies emotions from the user's facial expressions captured by images and real-time camera footage, and is used to personalize the generated clothing suggestions. Furthermore, the server collects inventory information for the selected clothing from multiple stores and makes the optimal trial reservation.
[0802] As a concrete example, when a user selects a specific piece of clothing, the server checks the inventory status of that clothing at each store and, based on the sentiment analysis results, suggests the most suitable store and reservation time. For instance, by inputting a prompt such as, "This user prefers simple designs, but please suggest a new T-shirt with distinctive sleeves," into the generating AI model, the system can provide suggestions tailored to the user. In this way, users can obtain clothing selection and trial experiences based on their emotions.
[0803] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0804] Step 1:
[0805] Users take a photo of their face using a device such as a smartphone or computer and upload the video data to the system. This input data, the facial photograph, is sent to the server as image data that captures the user's features.
[0806] Step 2:
[0807] The server inputs the received user's facial image into a generation AI model, which then generates images of various clothing items suitable for the user. These generated images are then processed from the user's image data and output as images of the user virtually wearing diverse styles of clothing.
[0808] Step 3:
[0809] The server sends a catalog of generated clothing images to the user's terminal, which the user then browses. The user selects their favorite outfit from the provided catalog. This selection information is then sent back to the server as the user's selection data.
[0810] Step 4:
[0811] The server uses user selection data to run an emotion analysis engine and analyze the user's emotions regarding their choices. This analysis objectively evaluates the user's facial expressions and reactions and performs data calculations to optimize clothing suggestions based on those emotions.
[0812] Step 5:
[0813] The server checks the inventory of the clothing selected by the user across multiple physical stores. The input here is information about the selected clothing, and the server queries the inventory status to collect information on the most suitable store and outputs the results.
[0814] Step 6:
[0815] Based on sentiment analysis results and inventory information, the server suggests the most suitable trial location and reservation date / time for the user. To achieve this, it generates prompt messages that reflect the user's interests (e.g., "This user prefers simple designs, but please suggest new T-shirts with distinctive sleeves") to optimize the trial experience.
[0816] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0817] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0818] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0819] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0820] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0821] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0822] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0823] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0824] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0825] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0826] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0827] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0828] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0829] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0830] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0831] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0832] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0833] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0834] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0835] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0836] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.
[0837] The following is further disclosed regarding the embodiments described above.
[0838] (Claim 1)
[0839] A means of registering a user's image,
[0840] A means for generating images of suitable clothing based on the registered user's image,
[0841] A means of obtaining costume information selected by the user from the generated costume images,
[0842] A way to check the stock status of the selected costume from multiple stores,
[0843] A system that includes a means of making fitting reservations at the most suitable store based on inventory information.
[0844] (Claim 2)
[0845] The system according to claim 1, further comprising means for providing the generated costume images to the user in catalog format.
[0846] (Claim 3)
[0847] The system according to claim 1, further comprising means for identifying the optimal store, taking into account the user's location information and the store's location information.
[0848] "Example 1"
[0849] (Claim 1)
[0850] A device that receives and records images of the user,
[0851] A device that generates images of corresponding costumes based on recorded images of the user,
[0852] A device that retrieves costume information selected by the user from the generated costume images,
[0853] A device that checks the inventory status of selected costumes from multiple sales locations,
[0854] A system that includes a device for scheduling fitting appointments at the most suitable sales facilities based on inventory and geographical information.
[0855] (Claim 2)
[0856] The system according to claim 1, further comprising a device for providing the generated costume illustrations to the user as an information collection.
[0857] (Claim 3)
[0858] The system according to claim 1, further comprising a device for identifying the optimal sales facility, taking into account the user's location information and the geographical information of the sales facility.
[0859] "Application Example 1"
[0860] (Claim 1)
[0861] A means of registering user characteristics,
[0862] A means for generating a suitable appearance representation based on the characteristics of a registered user,
[0863] A means of obtaining the appearance information selected by the user from the generated appearance representations,
[0864] A means of confirming the status of the selected exterior from multiple facilities,
[0865] A system that includes a means of making the most suitable facility experience reservation based on the information held.
[0866] (Claim 2)
[0867] The system according to claim 1, further comprising means for providing the generated appearance representations to the user in a list format.
[0868] (Claim 3)
[0869] The system according to claim 1, further comprising means for identifying the optimal facility, taking into account the user's location information and the facility's location information.
[0870] "Example 2 of combining an emotion engine"
[0871] (Claim 1)
[0872] Means for obtaining user images,
[0873] A means of creating images of clothing using a generation AI model based on acquired user images,
[0874] A means of analyzing the user's emotions using an emotion recognition device to analyze the generated images of costumes and recommending costumes based on those emotions,
[0875] A means of obtaining the costume data selected by the user,
[0876] A way to check the inventory information of the selected costume from multiple sales locations,
[0877] A system that includes means for suggesting fitting appointments at the optimal sales location based on inventory data and user sentiment analysis.
[0878] (Claim 2)
[0879] The system according to claim 1, further comprising means for providing the generated costume images to the user in catalog format and adjusting the display based on the sentiment analysis results.
[0880] (Claim 3)
[0881] The system according to claim 1, further comprising means for identifying the optimal sales location based on sentiment analysis results, taking into account the user's location information and the location information of sales locations.
[0882] "Application example 2 when combining with an emotional engine"
[0883] (Claim 1)
[0884] A means of registering user video,
[0885] A means for generating images of suitable clothing based on images of registered users,
[0886] A means of obtaining clothing information selected by the user from the generated clothing images,
[0887] A means of checking the inventory status of selected clothing from multiple locations,
[0888] A means of making a trial reservation at the optimal location based on inventory information,
[0889] A means of analyzing user emotions to suggest the most suitable clothing,
[0890] A means for analyzing user preferences and optimizing the generated clothing suggestions,
[0891] A system that includes this.
[0892] (Claim 2)
[0893] The system according to claim 1, further comprising means for providing the user with generated clothing images in catalog format and for making clothing suggestions based on the user's feelings.
[0894] (Claim 3)
[0895] The system according to claim 1, further comprising means for identifying an optimal location considering the user's location information and the location information of the location, and means for providing a virtual trial experience using a smart mirror. [Explanation of Symbols]
[0896] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means of registering a user's image, A means for generating images of suitable clothing based on the registered user's image, A means of obtaining costume information selected by the user from the generated costume images, A way to check the stock status of the selected costume from multiple stores, A system that includes a means of making fitting reservations at the most suitable store based on inventory information.
2. The system according to claim 1, further comprising means for providing the generated costume images to the user in catalog format.
3. The system according to claim 1, further comprising means for identifying the optimal store, taking into account the user's location information and the store's location information.