system
A system using image analysis and generative AI with a feedback loop addresses the challenge of selecting suitable hairstyles by analyzing facial and body contours, offering personalized and continuously improving suggestions.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-09
- Publication Date
- 2026-06-19
AI Technical Summary
Selecting a hairstyle suitable for an individual is difficult due to the lack of methods that can highly accurately consider different facial and body characteristics for each person, leading to a need for a system that can easily provide a style matching user needs and ideals.
An image analysis system that uses face recognition algorithms to analyze facial and body contours, combined with a similarity search to identify suitable hairstyles from a past database, and a generative AI to create personalized suggestions, with a feedback mechanism to improve accuracy.
Enables users to efficiently find hairstyles that suit them best while continuously improving suggestions based on user feedback, providing personalized and accurate hairstyle recommendations.
Smart Images

Figure 2026100708000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] Selecting a hairstyle suitable for an individual is difficult for many people. There is a lack of a method that can highly accurately propose an ideal hairstyle considering different facial and body characteristics for each person. For this reason, the development of a system that can easily provide a style that matches the needs and ideals of users is required.
Means for Solving the Problems
[0005] This invention provides an image analysis means that analyzes the contours of a face and body based on an image provided by a user using a face recognition algorithm. It also includes a similarity search means for identifying a hairstyle suitable for the user by referring to similar cases from a past database based on the analyzed features. Furthermore, it proposes a system that includes a generation means in which a generating AI generates an appropriate hairstyle based on similar cases, and a display means for presenting the results to the user. In addition, by including a feedback processing means that receives feedback from the user and reflects it in the next suggestion to make improvements, it becomes possible to propose hairstyles that are even more satisfying to the user.
[0006] "Image analysis means for analyzing facial and body contours" refers to a function that executes processes and algorithms to identify facial and body contours and feature points from image data provided by the user.
[0007] A "similarity search method" is a function that uses feature data obtained through image analysis to reference similar cases from past databases and analyze the results.
[0008] The "generation method" is a function that uses AI technology to generate images of hairstyles suitable for the user based on data identified by the similarity search method.
[0009] "Display means" refers to a function for visually presenting the image of the hairstyle created by the generation means to the user.
[0010] A "feedback processing mechanism" is a function that receives feedback from users and incorporates that feedback into the process of improving the accuracy and quality of the system's suggestions.
[0011] A "face recognition algorithm" is a general term for computational methods and programs used in image analysis to analyze and identify feature points of a face. [Brief explanation of the drawing]
[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]
[0013] An example of an embodiment of the system according to the technology of the present disclosure will be described below with reference to the accompanying drawings.
[0014] First, the terms used in the following description will be explained.
[0015] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0016] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0017] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0018] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0020] [First Embodiment]
[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0033] The system of this invention allows users to upload their own photos, and the server then uses advanced image analysis and AI technology to suggest hairstyles that are suitable for the user.
[0034] When a user uploads their photo to the system via their device, the server receives the photo. After receiving the photo, image analysis tools are used to extract facial and body features, and facial recognition algorithms are used in particular to identify facial contours and feature points. The feature data obtained here is then compared with a past database using a similarity search tool to identify similar cases that closely resemble the user.
[0035] Next, the server uses a generation mechanism to leverage AI based on similar cases and generates the optimal hairstyle for the user. The generated hairstyle is sent from the server to the terminal and visually presented to the user by a display mechanism.
[0036] Furthermore, users can send feedback about the suggested hairstyles to the server via their device. This feedback is then processed by a feedback processing system and used to improve the accuracy of future hairstyle suggestions.
[0037] For example, if a woman in her 20s with a round face and slightly plump cheeks uploads a photo, the server captures these features through image analysis and searches for past data with similar characteristics. Based on highly-rated styles, the AI suggests a bob haircut style, and that image is presented to the user. If the user likes this style and provides feedback, that opinion will be used to improve future suggestions.
[0038] This system allows users to efficiently find the hairstyle that best suits them, while simultaneously creating a feedback loop for continuous improvement.
[0039] The following describes the processing flow.
[0040] Step 1:
[0041] The user launches the application on their device, selects a photo, and presses the upload button. It is recommended that the user choose a photo that clearly shows their face and hair features.
[0042] Step 2:
[0043] The device sends the photos selected by the user to the server. Before transmission, data encryption and other measures are taken to ensure data security.
[0044] Step 3:
[0045] The server receives photo data sent from the terminal. The received data is passed to an image analysis module, which uses a face recognition algorithm to detect the contours of faces and bodies.
[0046] Step 4:
[0047] The server generates feature vectors for the user's face and body from the facial contours and feature points, and then uses these vectors to perform a similarity search on the past database.
[0048] Step 5:
[0049] The server generates the optimal hairstyle using a generation method based on similar cases obtained through similarity searches. AI technology is used to select a style that suits the user by referencing past highly-rated styles.
[0050] Step 6:
[0051] The server sends the generated hairstyle image to the terminal.
[0052] Step 7:
[0053] The terminal receives hairstyle images sent from the server and displays them to the user. The user can visually confirm the suggested style by viewing the images.
[0054] Step 8:
[0055] Users can provide feedback on the presented style. For example, they can select whether they like the suggested style.
[0056] Step 9:
[0057] The terminal sends user feedback to the server. This feedback is stored on the server to improve the quality of future suggestions.
[0058] (Example 1)
[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0060] In modern society, there is a growing demand for technologies that reduce the effort required for individual users to select the style and appearance best suited to them, and that more efficiently meet specific needs. In particular, the challenge lies in automatically and accurately suggesting hairstyles that fit an individual's face shape and contours.
[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0062] In this invention, the server includes data receiving processing means, image processing means, and similar information retrieval means. This enables personalized style suggestions based on images uploaded by the user.
[0063] A "data reception processing means" is a means equipped with the function of receiving image data provided by a user and temporarily storing it within the system.
[0064] "Image processing means" refers to means for analyzing features related to the face and its shape from received image data and extracting specific information.
[0065] A "similar information retrieval method" is a means of identifying similar cases from past cases or related information sets within a database, based on extracted characteristic information.
[0066] "Generation processing means" refers to means having an algorithm for generating suitable images and styles corresponding to individual characteristics based on similar cases.
[0067] An "information presentation means" is a means that has the function of transmitting data to an external display device or the like in order to visually present the generated image information to the user.
[0068] "Feedback processing methods" refer to means of obtaining feedback and opinions from users and using that information to improve the accuracy and functionality of future system proposals.
[0069] A "facial feature recognition algorithm" is an algorithm used to identify the characteristic points of a face and to distinguish the details of its shape, including its contours and facial expressions.
[0070] This invention's system suggests the most suitable hairstyle for a user based on their uploaded photo. The user begins by uploading a photo of their face to the system using a terminal. The terminal then transfers this image data to a server.
[0071] The server first receives the uploaded image using a data reception processing device and stores it in the system's temporary file storage. Then, it uses image analysis software (e.g., OpenCV, dlib) as an image processing device to extract facial and shape features from the photograph. In image processing, the contours and feature points of the user's face are detected with high accuracy, forming feature information that forms the basis for database searches.
[0072] Next, a similarity information retrieval method is used to search for similar cases in a database stored based on the extracted feature information. Here, a machine learning library (e.g., FAISS) is used to perform a search based on similarity. These cases with high similarity become the basic data used in the generation processing method.
[0073] The generation process uses a generative AI model (e.g., GAN, Transformers). This AI model creates the optimal hairstyle by constructing and inputting prompt sentences based on similar cases. These prompt sentences include specific requests such as, "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s."
[0074] The generated hairstyle image is transmitted to the user's terminal via an information display device. The terminal visually presents this to the user, allowing the user to confirm it.
[0075] Finally, users can provide feedback on the suggested hairstyles. This feedback is received by the server via an opinion processing mechanism, reflected in the database to improve the accuracy of future suggestions, and also used to train the AI model.
[0076] Through the above process, users can find hairstyles that suit their own features, and the system can continue to evolve based on user feedback.
[0077] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0078] Step 1:
[0079] The user uses a device to upload a photo of their face to the system. The input data is a facial image, and this data is sent from the device to the server. The server receives this data and stores it in temporary file storage. This step involves the user providing initial data.
[0080] Step 2:
[0081] The server uses data receiving and processing to organize uploaded images and prepare them for image analysis. The input data is a facial photograph sent by the user, and the output is image data ready for analysis. In this step, the necessary data processing is performed to ensure that the images are accurately analyzed.
[0082] Step 3:
[0083] The server uses image processing tools to extract facial and shape features from the received images. Image analysis software such as OpenCV or dlib is used. The input is stored image data, and the output is feature information including facial contours and feature points. In this step, a face recognition algorithm is executed to obtain facial shape data.
[0084] Step 4:
[0085] The server uses a similarity information retrieval method to search the database for similar cases based on the extracted feature information. It calculates similarity using machine learning libraries such as FAISS. The input is feature information, and the output is a list of similar cases. In this step, highly relevant information is extracted based on the existing database.
[0086] Step 5:
[0087] The server uses a generative AI model as a generation process, constructing prompt sentences based on similar cases and using them as input to generate the optimal hairstyle. The prompt sentences are specific, such as "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s." The input is a list of similar cases, and the output is an image of the generated hairstyle. In this step, the capabilities of AI are utilized to create a personalized style.
[0088] Step 6:
[0089] The server sends the generated hairstyle image to the terminal and displays it to the user through an information display device. The input is the image information of the generated hairstyle, and the output is the visual information displayed on the terminal. In this step, processing is performed to allow the user to actually confirm the suggested result.
[0090] Step 7:
[0091] The user uses their device to send feedback on the suggested hairstyle to the server. The input is the user's feedback information, and the output is opinion data that is reflected on the server. In this step, the user's opinions and feedback are collected for future improvements.
[0092] Step 8:
[0093] The server uses feedback processing tools to store the received feedback in a database and uses it to train the AI model. The input is feedback data, and the output is an improved proposed algorithm used for training. In this step, the system processes information from the user to improve itself.
[0094] (Application Example 1)
[0095] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0096] In recent years, there has been a growing demand for personal assistant systems that provide optimal suggestions to individual users. However, conventional suggestion systems often lack convenience because users have to manually input information. Furthermore, real-time information processing and suggestions tailored to the user's environment have been difficult. This invention aims to solve these problems and enable a personal assistant robot to instantly provide users with optimal information within their homes.
[0097] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0098] In this invention, the server includes image processing means for analyzing the shape of the face and body, similarity search means for referencing similar cases from a past information set based on the analyzed features, and generation means for creating an appropriate hairstyle based on the similar cases. As a result, users can obtain information tailored to them in real time simply by giving instructions to the robot.
[0099] "Facial and body shape" refers to the basic outlines and characteristics of a person's appearance, and these can be used to analyze each person's traits.
[0100] "Image processing means" refers to a device or technology for extracting specific information from input image data and performing analysis or recognition.
[0101] "Similar cases" refers to a collection of cases extracted from past information sets that share similar characteristics or features.
[0102] "Generative means" refers to methods or devices for creating new data or information based on analysis results.
[0103] "Visual information" refers to visual data used to convey analysis results and products to the user, i.e., information presented in the form of images and videos.
[0104] "Photography means" refers to devices or methods for acquiring images of a user or object using optical devices or the like.
[0105] "Communication methods" refer to the processes and technologies used to transmit data or information from one point to another.
[0106] A "feedback processing means" is a device or process for analyzing opinions received from users and utilizing them to improve the system.
[0107] An "image recognition algorithm" is a set of computational procedures and logic used to extract and identify specific patterns or features from digital images.
[0108] The system required to realize this invention uses a robot that acts as a beauty advisor for the user in the home. The robot is equipped with a high-quality camera for taking pictures and network capabilities for data communication. When the user instructs the robot to take a picture, the robot uses its built-in camera to photograph the user's face and sends the image to a server in the cloud.
[0109] The server uses image processing software (e.g., OpenCV or TENSORFLOW®) to analyze images and identify the shape of the user's face and body. Based on the feature data extracted through this analysis, a similarity search tool compares it with past databases to select similar cases. Based on the selected cases, the server uses AI to generate the optimal hairstyle. This generation uses a generative AI model, and the generation is based on a prompt message such as, "Analyze the user's face photo and suggest the best hairstyle for a party."
[0110] The generated visual information, i.e., the suggested hairstyle, is presented to the user through the robot. When the user provides feedback on the style to the robot, that feedback is sent back to the server and used by the feedback processing system to improve the accuracy of the next suggestion.
[0111] This system allows users to receive appropriate beauty advice in real time via a robot without having to perform complicated operations. Specific approaches include scenarios such as receiving suggestions for the best hairstyle for a weekend party.
[0112] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0113] Step 1:
[0114] The user gives instructions to the robot and requests that it take a picture of their face. The input is the user's instructions, and the output is an image of the user's face taken by the camera. The robot uses its built-in camera to take a high-resolution picture of the user's face.
[0115] Step 2:
[0116] The device transmits the captured facial image to the server via the internet. The input is the captured facial image, and the output is the completion of the transfer of the image data to the server. It uses network communication capabilities to ensure stable data transmission.
[0117] Step 3:
[0118] The server analyzes the received image data using image processing software. The input is a face image sent to the server. The output is shape data of the user's face and body. Data calculations are performed using OpenCV or TensorFlow to extract facial contours and feature points.
[0119] Step 4:
[0120] The server compares the analysis results with past data sets using a similarity search mechanism. The input is user shape data, and the output is the most similar past examples. This includes data processing operations that perform appropriate comparisons and selections from the database.
[0121] Step 5:
[0122] The server uses a generative AI model to generate the optimal hairstyle for the user. The input consists of information on similar cases and a prompt, and the output is visual information of the suggested hairstyle. Based on the prompt, "Analyze the user's facial photo and suggest the best hairstyle for a party," the AI generates the optimal style.
[0123] Step 6:
[0124] The terminal displays visual information of hairstyles received from the server to the user. The input is the visual information provided by the server, and the output is the user confirming the displayed hairstyle image. This includes the action of visually displaying the information on the robot's display.
[0125] Step 7:
[0126] The user provides feedback on the presented hairstyle. The input is the user's evaluation of the hairstyle. The output is the feedback sent to the server. The user provides feedback to the robot via voice or touch input.
[0127] Step 8:
[0128] The server processes user feedback and uses it to improve the accuracy of future suggestions. Input is user-provided feedback information, and output is model updates or database optimization. The feedback processing mechanism performs data calculations to improve the AI model.
[0129] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0130] The system incorporating the emotion engine of this invention analyzes facial and voice characteristics using image and audio data provided by the user to recognize the user's emotions. Subsequently, it has the unique function of suggesting the hairstyle most suitable for the user's current situation based on the acquired emotion data.
[0131] The user uploads photos using their device and inputs audio via the microphone as needed. The device sends this data to the server, which performs image and audio analysis in parallel. A facial recognition algorithm analyzes facial features and expressions, and an emotion engine infers the user's emotions based on their expressions and voice tone.
[0132] The server uses the emotion-recognized data to search its past database for examples with similar emotions and characteristics. This generates a hairstyle that matches the user's mood. This generated hairstyle is then sent to the terminal and presented to the user.
[0133] For example, if a user uploads a photo of themselves smiling and records a cheerful voice, the server will sense feelings of happiness and cheerfulness from this data and suggest hairstyles that were previously recommended and well-received by users with similar emotions. For instance, a style with glamorous curls might be generated and displayed on the user's device.
[0134] During the feedback phase, users can send their thoughts on the suggested styles from their device to the server. The feedback processing system records this information and uses it to improve the accuracy of future suggestions, thereby providing an even more personalized experience.
[0135] By utilizing an emotion engine in this way, it becomes possible to realize a next-generation system that leverages both visual and auditory information to provide optimal suggestions to users.
[0136] The following describes the processing flow.
[0137] Step 1:
[0138] The user selects their own photos and records audio through the application on their device. They then use the interface to upload this data and press the "Send" button.
[0139] Step 2:
[0140] The device transmits image and audio data provided by the user to the server. During this process, the data is encrypted to ensure security.
[0141] Step 3:
[0142] The server first passes the image data received from the terminal to an image analysis module, which uses a face recognition algorithm to detect facial features and expressions and identify the contours of the face.
[0143] Step 4:
[0144] The server simultaneously processes the audio data with a voice analysis module, analyzing the tone and intonation of the user's voice. Based on the analysis results, the emotion engine identifies the user's emotions from the facial expression data and audio data.
[0145] Step 5:
[0146] The server searches its past database for cases with similar emotions and features based on recognized emotion data and facial feature vectors. Based on these similar cases, it uses AI technology to generate the optimal hairstyle.
[0147] Step 6:
[0148] The server sends the generated hairstyle image to the terminal. When sending, it also includes the reason why the hairstyle was recommended based on the user's feelings.
[0149] Step 7:
[0150] The device displays received hairstyle images on its screen, providing visual suggestions to the user. Furthermore, it provides an interface for the user to input feedback on the suggestions.
[0151] Step 8:
[0152] Users can review the displayed style and send feedback from their device if they like it or would like to see improvements.
[0153] Step 9:
[0154] The terminal sends user feedback to the server, which records and analyzes it using a feedback processing system. This data is then incorporated into future suggestions, contributing to improved accuracy.
[0155] (Example 2)
[0156] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0157] Current style suggestion systems do not adequately provide personalized suggestions that take user emotions into account. Therefore, it is difficult to provide styles that suit the user's mindset and feelings, often resulting in low satisfaction with the suggestions.
[0158] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0159] In this invention, the server includes data analysis means for analyzing facial and voice characteristics, similarity search means for referencing similar cases from a database based on the analyzed emotion data, and generation means for generating appropriate styles based on similar cases. This makes it possible to suggest the optimal style according to each user's emotions.
[0160] "Data analysis means" refers to devices or methods that analyze image and audio data provided by users to extract facial and voice features.
[0161] A "similarity search method" is a device or method that searches for similar cases within a database based on analyzed sentiment data.
[0162] "Generation means" refers to a device or method that uses the results of similarity searches to generate a style that is appropriate to the user's emotions.
[0163] "Presentation means" refers to a device or method for visually displaying generated style information to the user.
[0164] A "feedback processing method" is a device or method that receives feedback from users and uses it to improve the accuracy of future proposals.
[0165] The system of this invention is designed to suggest individual styles based on the user's emotions. The system, in collaboration with the user, terminal, and server, provides a next-generation personalized experience.
[0166] Users can input facial photos and voice data using their device. The device collects this data and sends it to the server. The device can be a standard smartphone or personal computer.
[0167] The server uses image and audio analysis software to analyze the user's data in detail. The image analysis software has an algorithm for extracting facial feature points, and the audio analysis software analyzes voice tone. This analysis allows the system to recognize the user's emotions.
[0168] Once sentiment data is acquired, the server uses a similarity search function to search the database for past cases. The database records a history of various sentiments and their corresponding styles. This allows the server to identify styles that were effective for users with similar sentiments.
[0169] Next, the server uses a generative AI model to generate appropriate styles. The generated styles are customized based on sentiment data, resulting in the best possible suggestions for the user. Input to the AI model is in the form of prompts, such as "Generate a style suitable for a user who is feeling happy."
[0170] Finally, the server sends the generated style to the terminal. The terminal presents it to the user, who can then provide feedback. This feedback is processed by the server and used to improve the quality of future suggestions.
[0171] This enables more sophisticated and personalized suggestions by integrating visual and auditory information.
[0172] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0173] Step 1:
[0174] The user uses the device to take a photo of their face and record audio, and inputs them as data. This data includes a facial image file in image data format and an audio recording file in audio data format.
[0175] Step 2:
[0176] The terminal sends image and audio data obtained from the user to the server. The data is sent via the internet and received by the server. The output here is the raw image and audio files that arrived on the server.
[0177] Step 3:
[0178] The server starts analyzing the received data by running image analysis software and audio analysis software. The image analysis software identifies facial feature points (e.g., eyes, eyebrows, mouth, etc.) and extracts facial expression information. Meanwhile, the audio analysis software analyzes the tone from the audio and obtains indicators related to emotion. The output is the analyzed feature point data and tone data.
[0179] Step 4:
[0180] The server performs a similarity search based on the obtained sentiment data. It refers to the database to search for past user cases with similar sentiment data and retrieves the results. The output is similar sentiment data and its past style history.
[0181] Step 5:
[0182] The server creates prompt statements to generate appropriate styles using a generative AI model and inputs them into the model. For example, it might use a prompt statement like, "Generate a style suitable for a user who is feeling happy." Generating prompt statements from emotion data and inputting them into the AI model constitutes the input and output here.
[0183] Step 6:
[0184] The generative AI model generates styles based on the prompt text. The output is style data for a customized design.
[0185] Step 7:
[0186] The server sends the generated style data to the terminal. The terminal displays the sent data to the user, presenting it as a visual suggestion.
[0187] Step 8:
[0188] The user enters feedback on the suggested style via a terminal and sends this feedback data to the server. This allows the system to obtain information necessary for improving the accuracy of future suggestions. The output is the feedback information stored on the server.
[0189] (Application Example 2)
[0190] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0191] Traditional hair salons and style suggestion systems often struggled to provide personalized service based on the customer's emotions, resulting in customers mostly choosing from standardized options. This meant that customers couldn't obtain a style that best suited their individual feelings and moods, leading to a lack of a personalized experience.
[0192] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0193] In this invention, the server includes analysis means for analyzing image data and audio data, similarity search means for referencing similar emotion cases from a past data set based on the analyzed emotion data, and generation means for generating an appropriate appearance style based on the referenced similar emotion cases. This makes it possible to suggest a personalized style based on the user's emotions.
[0194] "Image data" is a collection of digital information used to provide visual information.
[0195] "Audio data" refers to digital information used to provide auditory stimuli.
[0196] "Analysis means" refers to methods or devices for detecting data and identifying its features and patterns.
[0197] "Emotional data" refers to information about a person's mental state that can be inferred from their facial expressions and tone of voice.
[0198] An "information collection" is a reference database containing past data and case studies.
[0199] A "similarity search method" is a method or device for finding items that have commonalities from existing data.
[0200] "Appearance style" refers to suggestions for hairstyles and clothing that determine how a person looks.
[0201] "Generative means" refers to methods and devices for creating new content or suggestions based on the results of analysis or retrieval.
[0202] A "generative AI model" is an algorithm or program that uses artificial intelligence to automatically generate results based on data.
[0203] A "feedback processing method" refers to a method or device for receiving opinions and feedback from users and using them to improve future services.
[0204] A "recognition algorithm" is a mathematical method used to analyze data and identify specific patterns or features from it.
[0205] In this invention, a server and a terminal work together to realize a system that integrates emotion recognition and style suggestion. The server utilizes analysis software using Python and libraries such as OpenCV and Librosa to analyze image and audio data. Based on the emotion data obtained from this data, the server searches for similar emotion cases in a past database. Here, a generative AI model is used to automatically generate an appearance style that is appropriate for the user's emotion.
[0206] The device acts as a receiver, sending image and audio data entered by the user to a server and displaying the generated appearance style. This allows the user to receive style suggestions that match their mood.
[0207] For example, when a user smiles at the tablet's camera, the device sends the image to the server. At the same time, if the user says, "I want a hairstyle that makes me look cheerful today," the server analyzes this voice. It then determines that the user is expressing happiness and, by referring to related past data, suggests a glamorous curly hairstyle, for instance. The generative AI model used in this process leverages the relationship between trained emotion data and appearance styles.
[0208] An example of a prompt message is, "Suggest the best hairstyle for a user who is smiling and cheerfully asking, 'What kind of hairstyle would suit me?'" This allows the user to receive personalized suggestions tailored to their individual emotions, resulting in a more satisfying experience.
[0209] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0210] Step 1:
[0211] The device acquires image and audio data entered by the user. This data is collected in real time using the camera and microphone. The input consists of user image and audio data, which are temporarily stored on the device.
[0212] Step 2:
[0213] The terminal transmits the acquired image and audio data to the server. The transmitted data is then input into the server's data analysis system. This process of data preparation and transmission enables real-time analysis.
[0214] Step 3:
[0215] The server processes the transmitted image data using the OpenCV library and performs analysis to extract the user's facial feature points from the image. This determines the user's facial expression and generates emotion data.
[0216] Step 4:
[0217] The server uses the Librosa library to analyze audio data. It estimates the user's emotions from the tone and pitch of the voice, and uses this to further reinforce the emotion data. The input is audio data, and the output is emotion data based on the audio.
[0218] Step 5:
[0219] The server integrates emotion data obtained from image and audio data, and searches for similar emotion cases by comparing them with past databases. Here, the integrated emotion data is the input, and the information of similar cases is the output.
[0220] Step 6:
[0221] The server uses a generative AI model to generate the most suitable appearance style based on similar sentiment cases. This model learns the relationship between past sentiment data and style suggestions, and outputs the best suggestion for the input sentiment.
[0222] Step 7:
[0223] The server sends an image of the generated appearance style to the terminal. The terminal presents this proposed image to the user, allowing for visual evaluation. Here, the generated appearance style is sent to the terminal as input and displayed to the user.
[0224] Step 8:
[0225] Users input feedback on the presented appearance style via a terminal. This feedback is sent back to the server and used to improve the accuracy of future suggestions in the database. Here, the user's reactions and impressions are input, and feedback information is output.
[0226] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0227] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0228] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0229] [Second Embodiment]
[0230] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0231] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0232] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0233] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0234] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0235] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0236] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0237] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0238] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0239] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0240] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0241] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0242] The system of this invention allows users to upload their own photos, and the server then uses advanced image analysis and AI technology to suggest hairstyles that are suitable for the user.
[0243] When a user uploads their photo to the system via their device, the server receives the photo. After receiving the photo, image analysis tools are used to extract facial and body features, and facial recognition algorithms are used in particular to identify facial contours and feature points. The feature data obtained here is then compared with a past database using a similarity search tool to identify similar cases that closely resemble the user.
[0244] Next, the server uses a generation mechanism to leverage AI based on similar cases and generates the optimal hairstyle for the user. The generated hairstyle is sent from the server to the terminal and visually presented to the user by a display mechanism.
[0245] Furthermore, users can send feedback about the suggested hairstyles to the server via their device. This feedback is then processed by a feedback processing system and used to improve the accuracy of future hairstyle suggestions.
[0246] For example, if a woman in her 20s with a round face and slightly plump cheeks uploads a photo, the server captures these features through image analysis and searches for past data with similar characteristics. Based on highly-rated styles, the AI suggests a bob haircut style, and that image is presented to the user. If the user likes this style and provides feedback, that opinion will be used to improve future suggestions.
[0247] This system allows users to efficiently find the hairstyle that best suits them, while simultaneously creating a feedback loop for continuous improvement.
[0248] The following describes the processing flow.
[0249] Step 1:
[0250] The user launches the application on their device, selects a photo, and presses the upload button. It is recommended that the user choose a photo that clearly shows their face and hair features.
[0251] Step 2:
[0252] The device sends the photos selected by the user to the server. Before transmission, data encryption and other measures are taken to ensure data security.
[0253] Step 3:
[0254] The server receives photo data sent from the terminal. The received data is passed to an image analysis module, which uses a face recognition algorithm to detect the contours of faces and bodies.
[0255] Step 4:
[0256] The server generates feature vectors for the user's face and body from the facial contours and feature points, and then uses these vectors to perform a similarity search on the past database.
[0257] Step 5:
[0258] The server generates the optimal hairstyle using a generation method based on similar cases obtained through similarity searches. AI technology is used to select a style that suits the user by referencing past highly-rated styles.
[0259] Step 6:
[0260] The server sends the generated hairstyle image to the terminal.
[0261] Step 7:
[0262] The terminal receives hairstyle images sent from the server and displays them to the user. The user can visually confirm the suggested style by viewing the images.
[0263] Step 8:
[0264] Users can provide feedback on the presented style. For example, they can select whether they like the suggested style.
[0265] Step 9:
[0266] The terminal sends user feedback to the server. This feedback is stored on the server to improve the quality of future suggestions.
[0267] (Example 1)
[0268] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0269] In modern society, there is a growing demand for technologies that reduce the effort required for individual users to select the style and appearance best suited to them, and that more efficiently meet specific needs. In particular, the challenge lies in automatically and accurately suggesting hairstyles that fit an individual's face shape and contours.
[0270] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0271] In this invention, the server includes data receiving processing means, image processing means, and similar information retrieval means. This enables personalized style suggestions based on images uploaded by the user.
[0272] A "data reception processing means" is a means equipped with the function of receiving image data provided by a user and temporarily storing it within the system.
[0273] "Image processing means" refers to means for analyzing features related to the face and its shape from received image data and extracting specific information.
[0274] A "similar information retrieval method" is a means of identifying similar cases from past cases or related information sets within a database, based on extracted characteristic information.
[0275] "Generation processing means" refers to means having an algorithm for generating suitable images and styles corresponding to individual characteristics based on similar cases.
[0276] An "information presentation means" is a means that has the function of transmitting data to an external display device or the like in order to visually present the generated image information to the user.
[0277] "Feedback processing methods" refer to means of obtaining feedback and opinions from users and using that information to improve the accuracy and functionality of future system proposals.
[0278] A "facial feature recognition algorithm" is an algorithm used to identify the characteristic points of a face and to distinguish the details of its shape, including its contours and facial expressions.
[0279] This invention's system suggests the most suitable hairstyle for a user based on their uploaded photo. The user begins by uploading a photo of their face to the system using a terminal. The terminal then transfers this image data to a server.
[0280] The server first receives the uploaded image using a data reception processing device and stores it in the system's temporary file storage. Then, it uses image analysis software (e.g., OpenCV, dlib) as an image processing device to extract facial and shape features from the photograph. In image processing, the contours and feature points of the user's face are detected with high accuracy, forming feature information that forms the basis for database searches.
[0281] Next, a similarity information retrieval method is used to search for similar cases in a database stored based on the extracted feature information. Here, a machine learning library (e.g., FAISS) is used to perform a search based on similarity. These cases with high similarity become the basic data used in the generation processing method.
[0282] As the generation processing means, a generation AI model (e.g., GAN, Transformers) is used. Based on similar cases, this AI model constructs and inputs a prompt sentence to create an optimal hairstyle. This prompt sentence includes specific requirements such as "Please propose a fashionable hairstyle suitable for a round face of a 20-year-old woman."
[0283] The generated hairstyle image is transmitted to the user's terminal by the information presentation means. The terminal visually presents this to the user, enabling the user to confirm it.
[0284] Finally, the user can provide feedback on the proposed hairstyle. This feedback is received by the server via the opinion processing means, reflected in the database to improve future proposal accuracy, and also utilized for the learning of the AI model.
[0285] Through the above process, the user can find a hairstyle suitable for their characteristics, and the system can continue to evolve based on the feedback from the users.
[0286] The flow of the specific process in Example 1 will be described using FIG. 11.
[0287] Step 1:
[0288] The user uses the terminal to upload their face photo to the system. The input data is a face photo image, and this data is transmitted from the terminal to the server. The server receives this data and stores it in temporary file storage. In this step, the user provides initial data.
[0289] Step 2:
[0290] The server uses data receiving and processing to organize uploaded images and prepare them for image analysis. The input data is a facial photograph sent by the user, and the output is image data ready for analysis. In this step, the necessary data processing is performed to ensure that the images are accurately analyzed.
[0291] Step 3:
[0292] The server uses image processing tools to extract facial and shape features from the received images. Image analysis software such as OpenCV or dlib is used. The input is stored image data, and the output is feature information including facial contours and feature points. In this step, a face recognition algorithm is executed to obtain facial shape data.
[0293] Step 4:
[0294] The server uses a similarity information retrieval method to search the database for similar cases based on the extracted feature information. It calculates similarity using machine learning libraries such as FAISS. The input is feature information, and the output is a list of similar cases. In this step, highly relevant information is extracted based on the existing database.
[0295] Step 5:
[0296] The server uses a generative AI model as a generation process, constructing prompt sentences based on similar cases and using them as input to generate the optimal hairstyle. The prompt sentences are specific, such as "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s." The input is a list of similar cases, and the output is an image of the generated hairstyle. In this step, the capabilities of AI are utilized to create a personalized style.
[0297] Step 6:
[0298] The server sends the generated hairstyle image to the terminal and displays it to the user through an information display device. The input is the image information of the generated hairstyle, and the output is the visual information displayed on the terminal. In this step, processing is performed to allow the user to actually confirm the suggested result.
[0299] Step 7:
[0300] The user uses their device to send feedback on the suggested hairstyle to the server. The input is the user's feedback information, and the output is opinion data that is reflected on the server. In this step, the user's opinions and feedback are collected for future improvements.
[0301] Step 8:
[0302] The server uses feedback processing tools to store the received feedback in a database and uses it to train the AI model. The input is feedback data, and the output is an improved proposed algorithm used for training. In this step, the system processes information from the user to improve itself.
[0303] (Application Example 1)
[0304] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0305] In recent years, there has been a growing demand for personal assistant systems that provide optimal suggestions to individual users. However, conventional suggestion systems often lack convenience because users have to manually input information. Furthermore, real-time information processing and suggestions tailored to the user's environment have been difficult. This invention aims to solve these problems and enable a personal assistant robot to instantly provide users with optimal information within their homes.
[0306] The specific processing by the specific processing unit 290 of the data processing apparatus 12 in Application Example 1 is realized by the following means.
[0307] In this invention, the server includes image processing means for analyzing the shape of the face and body, similarity search means for referring to similar cases from the past information set based on the analyzed features, and generation means for creating an appropriate hairstyle based on the similar cases. As a result, the user can obtain information suitable for themselves in real time simply by giving instructions to the robot.
[0308] "The shape of the face and body" refers to the basic outline and features in the appearance of a person, and based on this, the characteristics of each person can be analyzed.
[0309] "Image processing means" is a device or technology for extracting specific information from the input image data and performing analysis and recognition.
[0310] "Similar cases" refers to a set of cases extracted from the past information set, where the target characteristics and features are similar.
[0311] "Generation means" is a method or device for creating new data and information based on the analysis results.
[0312] "Visual information" refers to visual data for transmitting the analysis results and products to the user, that is, information presented in the form of images or videos.
[0313] "Shooting means" is a device or method for acquiring an image of the user or an object using an optical device or the like.
[0314] "Communication means" is a process or technology for transmitting data and information from one location to another location.
[0315] "Feedback processing means" is a device or process for analyzing the opinions received from the user and utilizing them for system improvement.
[0316] An "image recognition algorithm" is a set of computational procedures and logic used to extract and identify specific patterns or features from digital images.
[0317] The system required to realize this invention uses a robot that acts as a beauty advisor for the user in the home. The robot is equipped with a high-quality camera for taking pictures and network capabilities for data communication. When the user instructs the robot to take a picture, the robot uses its built-in camera to photograph the user's face and sends the image to a server in the cloud.
[0318] The server uses image processing software (e.g., OpenCV or TensorFlow) to analyze images and identify the shape of the user's face and body. Based on the feature data extracted through this analysis, a similarity search tool compares it with past databases to select similar cases. Based on the selected cases, the server uses AI to generate the optimal hairstyle. This generation uses a generative AI model, and the generation is based on a prompt message such as, "Analyze the user's face photo and suggest the best hairstyle for a party."
[0319] The generated visual information, i.e., the suggested hairstyle, is presented to the user through the robot. When the user provides feedback on the style to the robot, that feedback is sent back to the server and used by the feedback processing system to improve the accuracy of the next suggestion.
[0320] This system allows users to receive appropriate beauty advice in real time via a robot without having to perform complicated operations. Specific approaches include scenarios such as receiving suggestions for the best hairstyle for a weekend party.
[0321] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0322] Step 1:
[0323] The user gives instructions to the robot and requests that it take a picture of their face. The input is the user's instructions, and the output is an image of the user's face taken by the camera. The robot uses its built-in camera to take a high-resolution picture of the user's face.
[0324] Step 2:
[0325] The device transmits the captured facial image to the server via the internet. The input is the captured facial image, and the output is the completion of the transfer of the image data to the server. It uses network communication capabilities to ensure stable data transmission.
[0326] Step 3:
[0327] The server analyzes the received image data using image processing software. The input is a face image sent to the server. The output is shape data of the user's face and body. Data calculations are performed using OpenCV or TensorFlow to extract facial contours and feature points.
[0328] Step 4:
[0329] The server compares the analysis results with past data sets using a similarity search mechanism. The input is user shape data, and the output is the most similar past examples. This includes data processing operations that perform appropriate comparisons and selections from the database.
[0330] Step 5:
[0331] The server uses a generative AI model to generate the optimal hairstyle for the user. The input consists of information on similar cases and a prompt, and the output is visual information of the suggested hairstyle. Based on the prompt, "Analyze the user's facial photo and suggest the best hairstyle for a party," the AI generates the optimal style.
[0332] Step 6:
[0333] The terminal displays visual information of hairstyles received from the server to the user. The input is the visual information provided by the server, and the output is the user confirming the displayed hairstyle image. This includes the action of visually displaying the information on the robot's display.
[0334] Step 7:
[0335] The user provides feedback on the presented hairstyle. The input is the user's evaluation of the hairstyle. The output is the feedback sent to the server. The user provides feedback to the robot via voice or touch input.
[0336] Step 8:
[0337] The server processes user feedback and uses it to improve the accuracy of future suggestions. Input is user-provided feedback information, and output is model updates or database optimization. The feedback processing mechanism performs data calculations to improve the AI model.
[0338] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0339] The system incorporating the emotion engine of this invention analyzes facial and voice characteristics using image and audio data provided by the user to recognize the user's emotions. Subsequently, it has the unique function of suggesting the hairstyle most suitable for the user's current situation based on the acquired emotion data.
[0340] The user uploads photos using their device and inputs audio via the microphone as needed. The device sends this data to the server, which performs image and audio analysis in parallel. A facial recognition algorithm analyzes facial features and expressions, and an emotion engine infers the user's emotions based on their expressions and voice tone.
[0341] The server uses the emotion-recognized data to search its past database for examples with similar emotions and characteristics. This generates a hairstyle that matches the user's mood. This generated hairstyle is then sent to the terminal and presented to the user.
[0342] For example, if a user uploads a photo of themselves smiling and records a cheerful voice, the server will sense feelings of happiness and cheerfulness from this data and suggest hairstyles that were previously recommended and well-received by users with similar emotions. For instance, a style with glamorous curls might be generated and displayed on the user's device.
[0343] During the feedback phase, users can send their thoughts on the suggested styles from their device to the server. The feedback processing system records this information and uses it to improve the accuracy of future suggestions, thereby providing an even more personalized experience.
[0344] By utilizing an emotion engine in this way, it becomes possible to realize a next-generation system that leverages both visual and auditory information to provide optimal suggestions to users.
[0345] The following describes the processing flow.
[0346] Step 1:
[0347] The user selects their own photos and records audio through the application on their device. They then use the interface to upload this data and press the "Send" button.
[0348] Step 2:
[0349] The device transmits image and audio data provided by the user to the server. During this process, the data is encrypted to ensure security.
[0350] Step 3:
[0351] The server first passes the image data received from the terminal to an image analysis module, which uses a face recognition algorithm to detect facial features and expressions and identify the contours of the face.
[0352] Step 4:
[0353] The server simultaneously processes the audio data with a voice analysis module, analyzing the tone and intonation of the user's voice. Based on the analysis results, the emotion engine identifies the user's emotions from the facial expression data and audio data.
[0354] Step 5:
[0355] The server searches its past database for cases with similar emotions and features based on recognized emotion data and facial feature vectors. Based on these similar cases, it uses AI technology to generate the optimal hairstyle.
[0356] Step 6:
[0357] The server sends the generated hairstyle image to the terminal. When sending, it also includes the reason why the hairstyle was recommended based on the user's feelings.
[0358] Step 7:
[0359] The device displays received hairstyle images on its screen, providing visual suggestions to the user. Furthermore, it provides an interface for the user to input feedback on the suggestions.
[0360] Step 8:
[0361] Users can review the displayed style and send feedback from their device if they like it or would like to see improvements.
[0362] Step 9:
[0363] The terminal sends user feedback to the server, which records and analyzes it using a feedback processing system. This data is then incorporated into future suggestions, contributing to improved accuracy.
[0364] (Example 2)
[0365] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0366] Current style suggestion systems do not adequately provide personalized suggestions that take user emotions into account. Therefore, it is difficult to provide styles that suit the user's mindset and feelings, often resulting in low satisfaction with the suggestions.
[0367] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0368] In this invention, the server includes data analysis means for analyzing facial and voice characteristics, similarity search means for referencing similar cases from a database based on the analyzed emotion data, and generation means for generating appropriate styles based on similar cases. This makes it possible to suggest the optimal style according to each user's emotions.
[0369] "Data analysis means" refers to devices or methods that analyze image and audio data provided by users to extract facial and voice features.
[0370] A "similarity search method" is a device or method that searches for similar cases within a database based on analyzed sentiment data.
[0371] "Generation means" refers to a device or method that uses the results of similarity searches to generate a style that is appropriate to the user's emotions.
[0372] "Presentation means" refers to a device or method for visually displaying generated style information to the user.
[0373] A "feedback processing method" is a device or method that receives feedback from users and uses it to improve the accuracy of future proposals.
[0374] The system of this invention is designed to suggest individual styles based on the user's emotions. The system, in collaboration with the user, terminal, and server, provides a next-generation personalized experience.
[0375] Users can input facial photos and voice data using their device. The device collects this data and sends it to the server. The device can be a standard smartphone or personal computer.
[0376] The server uses image and audio analysis software to analyze the user's data in detail. The image analysis software has an algorithm for extracting facial feature points, and the audio analysis software analyzes voice tone. This analysis allows the system to recognize the user's emotions.
[0377] Once sentiment data is acquired, the server uses a similarity search function to search the database for past cases. The database records a history of various sentiments and their corresponding styles. This allows the server to identify styles that were effective for users with similar sentiments.
[0378] Next, the server uses a generative AI model to generate appropriate styles. The generated styles are customized based on sentiment data, resulting in the best possible suggestions for the user. Input to the AI model is in the form of prompts, such as "Generate a style suitable for a user who is feeling happy."
[0379] Finally, the server sends the generated style to the terminal. The terminal presents it to the user, who can then provide feedback. This feedback is processed by the server and used to improve the quality of future suggestions.
[0380] This enables more sophisticated and personalized suggestions by integrating visual and auditory information.
[0381] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0382] Step 1:
[0383] The user uses the device to take a photo of their face and record audio, and inputs them as data. This data includes a facial image file in image data format and an audio recording file in audio data format.
[0384] Step 2:
[0385] The terminal sends image and audio data obtained from the user to the server. The data is sent via the internet and received by the server. The output here is the raw image and audio files that arrived on the server.
[0386] Step 3:
[0387] The server starts analyzing the received data by running image analysis software and audio analysis software. The image analysis software identifies facial feature points (e.g., eyes, eyebrows, mouth, etc.) and extracts facial expression information. Meanwhile, the audio analysis software analyzes the tone from the audio and obtains indicators related to emotion. The output is the analyzed feature point data and tone data.
[0388] Step 4:
[0389] The server performs a similarity search based on the obtained sentiment data. It refers to the database to search for past user cases with similar sentiment data and retrieves the results. The output is similar sentiment data and its past style history.
[0390] Step 5:
[0391] The server creates prompt statements to generate appropriate styles using a generative AI model and inputs them into the model. For example, it might use a prompt statement like, "Generate a style suitable for a user who is feeling happy." Generating prompt statements from emotion data and inputting them into the AI model constitutes the input and output here.
[0392] Step 6:
[0393] The generative AI model generates styles based on the prompt text. The output is style data for a customized design.
[0394] Step 7:
[0395] The server sends the generated style data to the terminal. The terminal displays the sent data to the user, presenting it as a visual suggestion.
[0396] Step 8:
[0397] The user enters feedback on the suggested style via a terminal and sends this feedback data to the server. This allows the system to obtain information necessary for improving the accuracy of future suggestions. The output is the feedback information stored on the server.
[0398] (Application Example 2)
[0399] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0400] Traditional hair salons and style suggestion systems often struggled to provide personalized service based on the customer's emotions, resulting in customers mostly choosing from standardized options. This meant that customers couldn't obtain a style that best suited their individual feelings and moods, leading to a lack of a personalized experience.
[0401] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0402] In this invention, the server includes analysis means for analyzing image data and audio data, similarity search means for referencing similar emotion cases from a past data set based on the analyzed emotion data, and generation means for generating an appropriate appearance style based on the referenced similar emotion cases. This makes it possible to suggest a personalized style based on the user's emotions.
[0403] "Image data" is a collection of digital information used to provide visual information.
[0404] "Audio data" refers to digital information used to provide auditory stimuli.
[0405] "Analysis means" refers to methods or devices for detecting data and identifying its features and patterns.
[0406] "Emotional data" refers to information about a person's mental state that can be inferred from their facial expressions and tone of voice.
[0407] An "information collection" is a reference database containing past data and case studies.
[0408] A "similarity search method" is a method or device for finding items that have commonalities from existing data.
[0409] "Appearance style" refers to suggestions for hairstyles and clothing that determine how a person looks.
[0410] "Generative means" refers to methods and devices for creating new content or suggestions based on the results of analysis or retrieval.
[0411] A "generative AI model" is an algorithm or program that uses artificial intelligence to automatically generate results based on data.
[0412] A "feedback processing method" refers to a method or device for receiving opinions and feedback from users and using them to improve future services.
[0413] A "recognition algorithm" is a mathematical method used to analyze data and identify specific patterns or features from it.
[0414] In this invention, a server and a terminal work together to realize a system that integrates emotion recognition and style suggestion. The server utilizes analysis software using Python and libraries such as OpenCV and Librosa to analyze image and audio data. Based on the emotion data obtained from this data, the server searches for similar emotion cases in a past database. Here, a generative AI model is used to automatically generate an appearance style that is appropriate for the user's emotion.
[0415] The device acts as a receiver, sending image and audio data entered by the user to a server and displaying the generated appearance style. This allows the user to receive style suggestions that match their mood.
[0416] For example, when a user smiles at the tablet's camera, the device sends the image to the server. At the same time, if the user says, "I want a hairstyle that makes me look cheerful today," the server analyzes this voice. It then determines that the user is expressing happiness and, by referring to related past data, suggests a glamorous curly hairstyle, for instance. The generative AI model used in this process leverages the relationship between trained emotion data and appearance styles.
[0417] An example of a prompt message is, "Suggest the best hairstyle for a user who is smiling and cheerfully asking, 'What kind of hairstyle would suit me?'" This allows the user to receive personalized suggestions tailored to their individual emotions, resulting in a more satisfying experience.
[0418] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0419] Step 1:
[0420] The device acquires image and audio data entered by the user. This data is collected in real time using the camera and microphone. The input consists of user image and audio data, which are temporarily stored on the device.
[0421] Step 2:
[0422] The terminal transmits the acquired image and audio data to the server. The transmitted data is then input into the server's data analysis system. This process of data preparation and transmission enables real-time analysis.
[0423] Step 3:
[0424] The server processes the transmitted image data using the OpenCV library and performs analysis to extract the user's facial feature points from the image. This determines the user's facial expression and generates emotion data.
[0425] Step 4:
[0426] The server uses the Librosa library to analyze audio data. It estimates the user's emotions from the tone and pitch of the voice, and uses this to further reinforce the emotion data. The input is audio data, and the output is emotion data based on the audio.
[0427] Step 5:
[0428] The server integrates emotion data obtained from image and audio data, and searches for similar emotion cases by comparing them with past databases. Here, the integrated emotion data is the input, and the information of similar cases is the output.
[0429] Step 6:
[0430] The server uses a generative AI model to generate the most suitable appearance style based on similar sentiment cases. This model learns the relationship between past sentiment data and style suggestions, and outputs the best suggestion for the input sentiment.
[0431] Step 7:
[0432] The server sends an image of the generated appearance style to the terminal. The terminal presents this proposed image to the user, allowing for visual evaluation. Here, the generated appearance style is sent to the terminal as input and displayed to the user.
[0433] Step 8:
[0434] Users input feedback on the presented appearance style via a terminal. This feedback is sent back to the server and used to improve the accuracy of future suggestions in the database. Here, the user's reactions and impressions are input, and feedback information is output.
[0435] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0436] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0437] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0438] [Third Embodiment]
[0439] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0440] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0441] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0442] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0443] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0444] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0445] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0446] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0447] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0448] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0449] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0450] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0451] The system of this invention allows users to upload their own photos, and the server then uses advanced image analysis and AI technology to suggest hairstyles that are suitable for the user.
[0452] When a user uploads their photo to the system via their device, the server receives the photo. After receiving the photo, image analysis tools are used to extract facial and body features, and facial recognition algorithms are used in particular to identify facial contours and feature points. The feature data obtained here is then compared with a past database using a similarity search tool to identify similar cases that closely resemble the user.
[0453] Next, the server uses a generation mechanism to leverage AI based on similar cases and generates the optimal hairstyle for the user. The generated hairstyle is sent from the server to the terminal and visually presented to the user by a display mechanism.
[0454] Furthermore, users can send feedback about the suggested hairstyles to the server via their device. This feedback is then processed by a feedback processing system and used to improve the accuracy of future hairstyle suggestions.
[0455] For example, if a woman in her 20s with a round face and slightly plump cheeks uploads a photo, the server captures these features through image analysis and searches for past data with similar characteristics. Based on highly-rated styles, the AI suggests a bob haircut style, and that image is presented to the user. If the user likes this style and provides feedback, that opinion will be used to improve future suggestions.
[0456] This system allows users to efficiently find the hairstyle that best suits them, while simultaneously creating a feedback loop for continuous improvement.
[0457] The following describes the processing flow.
[0458] Step 1:
[0459] The user launches the application on their device, selects a photo, and presses the upload button. It is recommended that the user choose a photo that clearly shows their face and hair features.
[0460] Step 2:
[0461] The device sends the photos selected by the user to the server. Before transmission, data encryption and other measures are taken to ensure data security.
[0462] Step 3:
[0463] The server receives photo data sent from the terminal. The received data is passed to an image analysis module, which uses a face recognition algorithm to detect the contours of faces and bodies.
[0464] Step 4:
[0465] The server generates feature vectors for the user's face and body from the facial contours and feature points, and then uses these vectors to perform a similarity search on the past database.
[0466] Step 5:
[0467] The server generates the optimal hairstyle using a generation method based on similar cases obtained through similarity searches. AI technology is used to select a style that suits the user by referencing past highly-rated styles.
[0468] Step 6:
[0469] The server sends the generated hairstyle image to the terminal.
[0470] Step 7:
[0471] The terminal receives hairstyle images sent from the server and displays them to the user. The user can visually confirm the suggested style by viewing the images.
[0472] Step 8:
[0473] Users can provide feedback on the presented style. For example, they can select whether they like the suggested style.
[0474] Step 9:
[0475] The terminal sends user feedback to the server. This feedback is stored on the server to improve the quality of future suggestions.
[0476] (Example 1)
[0477] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0478] In modern society, there is a growing demand for technologies that reduce the effort required for individual users to select the style and appearance best suited to them, and that more efficiently meet specific needs. In particular, the challenge lies in automatically and accurately suggesting hairstyles that fit an individual's face shape and contours.
[0479] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0480] In this invention, the server includes data receiving processing means, image processing means, and similar information retrieval means. This enables personalized style suggestions based on images uploaded by the user.
[0481] A "data reception processing means" is a means equipped with the function of receiving image data provided by a user and temporarily storing it within the system.
[0482] "Image processing means" refers to means for analyzing features related to the face and its shape from received image data and extracting specific information.
[0483] A "similar information retrieval method" is a means of identifying similar cases from past cases or related information sets within a database, based on extracted characteristic information.
[0484] "Generation processing means" refers to means having an algorithm for generating suitable images and styles corresponding to individual characteristics based on similar cases.
[0485] An "information presentation means" is a means that has the function of transmitting data to an external display device or the like in order to visually present the generated image information to the user.
[0486] "Feedback processing methods" refer to means of obtaining feedback and opinions from users and using that information to improve the accuracy and functionality of future system proposals.
[0487] A "facial feature recognition algorithm" is an algorithm used to identify the characteristic points of a face and to distinguish the details of its shape, including its contours and facial expressions.
[0488] This invention's system suggests the most suitable hairstyle for a user based on their uploaded photo. The user begins by uploading a photo of their face to the system using a terminal. The terminal then transfers this image data to a server.
[0489] The server first receives the uploaded image using a data reception processing device and stores it in the system's temporary file storage. Then, it uses image analysis software (e.g., OpenCV, dlib) as an image processing device to extract facial and shape features from the photograph. In image processing, the contours and feature points of the user's face are detected with high accuracy, forming feature information that forms the basis for database searches.
[0490] Next, a similarity information retrieval method is used to search for similar cases in a database stored based on the extracted feature information. Here, a machine learning library (e.g., FAISS) is used to perform a search based on similarity. These cases with high similarity become the basic data used in the generation processing method.
[0491] The generation process uses a generative AI model (e.g., GAN, Transformers). This AI model creates the optimal hairstyle by constructing and inputting prompt sentences based on similar cases. These prompt sentences include specific requests such as, "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s."
[0492] The generated hairstyle image is transmitted to the user's terminal via an information display device. The terminal visually presents this to the user, allowing the user to confirm it.
[0493] Finally, users can provide feedback on the suggested hairstyles. This feedback is received by the server via an opinion processing mechanism, reflected in the database to improve the accuracy of future suggestions, and also used to train the AI model.
[0494] Through the above process, users can find hairstyles that suit their own features, and the system can continue to evolve based on user feedback.
[0495] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0496] Step 1:
[0497] The user uses a device to upload a photo of their face to the system. The input data is a facial image, and this data is sent from the device to the server. The server receives this data and stores it in temporary file storage. This step involves the user providing initial data.
[0498] Step 2:
[0499] The server uses data receiving and processing to organize uploaded images and prepare them for image analysis. The input data is a facial photograph sent by the user, and the output is image data ready for analysis. In this step, the necessary data processing is performed to ensure that the images are accurately analyzed.
[0500] Step 3:
[0501] The server uses image processing tools to extract facial and shape features from the received images. Image analysis software such as OpenCV or dlib is used. The input is stored image data, and the output is feature information including facial contours and feature points. In this step, a face recognition algorithm is executed to obtain facial shape data.
[0502] Step 4:
[0503] The server uses a similarity information retrieval method to search the database for similar cases based on the extracted feature information. It calculates similarity using machine learning libraries such as FAISS. The input is feature information, and the output is a list of similar cases. In this step, highly relevant information is extracted based on the existing database.
[0504] Step 5:
[0505] The server uses a generative AI model as a generation process, constructing prompt sentences based on similar cases and using them as input to generate the optimal hairstyle. The prompt sentences are specific, such as "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s." The input is a list of similar cases, and the output is an image of the generated hairstyle. In this step, the capabilities of AI are utilized to create a personalized style.
[0506] Step 6:
[0507] The server sends the generated hairstyle image to the terminal and displays it to the user through an information display device. The input is the image information of the generated hairstyle, and the output is the visual information displayed on the terminal. In this step, processing is performed to allow the user to actually confirm the suggested result.
[0508] Step 7:
[0509] The user uses their device to send feedback on the suggested hairstyle to the server. The input is the user's feedback information, and the output is opinion data that is reflected on the server. In this step, the user's opinions and feedback are collected for future improvements.
[0510] Step 8:
[0511] The server uses feedback processing tools to store the received feedback in a database and uses it to train the AI model. The input is feedback data, and the output is an improved proposed algorithm used for training. In this step, the system processes information from the user to improve itself.
[0512] (Application Example 1)
[0513] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0514] In recent years, there has been a growing demand for personal assistant systems that provide optimal suggestions to individual users. However, conventional suggestion systems often lack convenience because users have to manually input information. Furthermore, real-time information processing and suggestions tailored to the user's environment have been difficult. This invention aims to solve these problems and enable a personal assistant robot to instantly provide users with optimal information within their homes.
[0515] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0516] In this invention, the server includes image processing means for analyzing the shape of the face and body, similarity search means for referencing similar cases from a past information set based on the analyzed features, and generation means for creating an appropriate hairstyle based on the similar cases. As a result, users can obtain information tailored to them in real time simply by giving instructions to the robot.
[0517] "Facial and body shape" refers to the basic outlines and characteristics of a person's appearance, and these can be used to analyze each person's traits.
[0518] "Image processing means" refers to a device or technology for extracting specific information from input image data and performing analysis or recognition.
[0519] "Similar cases" refers to a collection of cases extracted from past information sets that share similar characteristics or features.
[0520] "Generative means" refers to methods or devices for creating new data or information based on analysis results.
[0521] "Visual information" refers to visual data used to convey analysis results and products to the user, i.e., information presented in the form of images and videos.
[0522] "Photography means" refers to devices or methods for acquiring images of a user or object using optical devices or the like.
[0523] "Communication methods" refer to the processes and technologies used to transmit data or information from one point to another.
[0524] A "feedback processing means" is a device or process for analyzing opinions received from users and utilizing them to improve the system.
[0525] An "image recognition algorithm" is a set of computational procedures and logic used to extract and identify specific patterns or features from digital images.
[0526] The system required to realize this invention uses a robot that acts as a beauty advisor for the user in the home. The robot is equipped with a high-quality camera for taking pictures and network capabilities for data communication. When the user instructs the robot to take a picture, the robot uses its built-in camera to photograph the user's face and sends the image to a server in the cloud.
[0527] The server uses image processing software (e.g., OpenCV or TensorFlow) to analyze images and identify the shape of the user's face and body. Based on the feature data extracted through this analysis, a similarity search tool compares it with past databases to select similar cases. Based on the selected cases, the server uses AI to generate the optimal hairstyle. This generation uses a generative AI model, and the generation is based on a prompt message such as, "Analyze the user's face photo and suggest the best hairstyle for a party."
[0528] The generated visual information, i.e., the suggested hairstyle, is presented to the user through the robot. When the user provides feedback on the style to the robot, that feedback is sent back to the server and used by the feedback processing system to improve the accuracy of the next suggestion.
[0529] This system allows users to receive appropriate beauty advice in real time via a robot without having to perform complicated operations. Specific approaches include scenarios such as receiving suggestions for the best hairstyle for a weekend party.
[0530] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0531] Step 1:
[0532] The user gives instructions to the robot and requests that it take a picture of their face. The input is the user's instructions, and the output is an image of the user's face taken by the camera. The robot uses its built-in camera to take a high-resolution picture of the user's face.
[0533] Step 2:
[0534] The device transmits the captured facial image to the server via the internet. The input is the captured facial image, and the output is the completion of the transfer of the image data to the server. It uses network communication capabilities to ensure stable data transmission.
[0535] Step 3:
[0536] The server analyzes the received image data using image processing software. The input is a face image sent to the server. The output is shape data of the user's face and body. Data calculations are performed using OpenCV or TensorFlow to extract facial contours and feature points.
[0537] Step 4:
[0538] The server compares the analysis results with past data sets using a similarity search mechanism. The input is user shape data, and the output is the most similar past examples. This includes data processing operations that perform appropriate comparisons and selections from the database.
[0539] Step 5:
[0540] The server uses a generative AI model to generate the optimal hairstyle for the user. The input consists of information on similar cases and a prompt, and the output is visual information of the suggested hairstyle. Based on the prompt, "Analyze the user's facial photo and suggest the best hairstyle for a party," the AI generates the optimal style.
[0541] Step 6:
[0542] The terminal displays visual information of hairstyles received from the server to the user. The input is the visual information provided by the server, and the output is the user confirming the displayed hairstyle image. This includes the action of visually displaying the information on the robot's display.
[0543] Step 7:
[0544] The user provides feedback on the presented hairstyle. The input is the user's evaluation of the hairstyle. The output is the feedback sent to the server. The user provides feedback to the robot via voice or touch input.
[0545] Step 8:
[0546] The server processes user feedback and uses it to improve the accuracy of future suggestions. Input is user-provided feedback information, and output is model updates or database optimization. The feedback processing mechanism performs data calculations to improve the AI model.
[0547] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0548] The system incorporating the emotion engine of this invention analyzes facial and voice characteristics using image and audio data provided by the user to recognize the user's emotions. Subsequently, it has the unique function of suggesting the hairstyle most suitable for the user's current situation based on the acquired emotion data.
[0549] The user uploads photos using their device and inputs audio via the microphone as needed. The device sends this data to the server, which performs image and audio analysis in parallel. A facial recognition algorithm analyzes facial features and expressions, and an emotion engine infers the user's emotions based on their expressions and voice tone.
[0550] The server uses the emotion-recognized data to search its past database for examples with similar emotions and characteristics. This generates a hairstyle that matches the user's mood. This generated hairstyle is then sent to the terminal and presented to the user.
[0551] For example, if a user uploads a photo of themselves smiling and records a cheerful voice, the server will sense feelings of happiness and cheerfulness from this data and suggest hairstyles that were previously recommended and well-received by users with similar emotions. For instance, a style with glamorous curls might be generated and displayed on the user's device.
[0552] During the feedback phase, users can send their thoughts on the suggested styles from their device to the server. The feedback processing system records this information and uses it to improve the accuracy of future suggestions, thereby providing an even more personalized experience.
[0553] By utilizing an emotion engine in this way, it becomes possible to realize a next-generation system that leverages both visual and auditory information to provide optimal suggestions to users.
[0554] The following describes the processing flow.
[0555] Step 1:
[0556] The user selects their own photos and records audio through the application on their device. They then use the interface to upload this data and press the "Send" button.
[0557] Step 2:
[0558] The device transmits image and audio data provided by the user to the server. During this process, the data is encrypted to ensure security.
[0559] Step 3:
[0560] The server first passes the image data received from the terminal to an image analysis module, which uses a face recognition algorithm to detect facial features and expressions and identify the contours of the face.
[0561] Step 4:
[0562] The server simultaneously processes the audio data with a voice analysis module, analyzing the tone and intonation of the user's voice. Based on the analysis results, the emotion engine identifies the user's emotions from the facial expression data and audio data.
[0563] Step 5:
[0564] The server searches its past database for cases with similar emotions and features based on recognized emotion data and facial feature vectors. Based on these similar cases, it uses AI technology to generate the optimal hairstyle.
[0565] Step 6:
[0566] The server sends the generated hairstyle image to the terminal. When sending, it also includes the reason why the hairstyle was recommended based on the user's feelings.
[0567] Step 7:
[0568] The device displays received hairstyle images on its screen, providing visual suggestions to the user. Furthermore, it provides an interface for the user to input feedback on the suggestions.
[0569] Step 8:
[0570] Users can review the displayed style and send feedback from their device if they like it or would like to see improvements.
[0571] Step 9:
[0572] The terminal sends user feedback to the server, which records and analyzes it using a feedback processing system. This data is then incorporated into future suggestions, contributing to improved accuracy.
[0573] (Example 2)
[0574] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0575] Current style suggestion systems do not adequately provide personalized suggestions that take user emotions into account. Therefore, it is difficult to provide styles that suit the user's mindset and feelings, often resulting in low satisfaction with the suggestions.
[0576] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0577] In this invention, the server includes data analysis means for analyzing facial and voice characteristics, similarity search means for referencing similar cases from a database based on the analyzed emotion data, and generation means for generating appropriate styles based on similar cases. This makes it possible to suggest the optimal style according to each user's emotions.
[0578] "Data analysis means" refers to devices or methods that analyze image and audio data provided by users to extract facial and voice features.
[0579] A "similarity search method" is a device or method that searches for similar cases within a database based on analyzed sentiment data.
[0580] "Generation means" refers to a device or method that uses the results of similarity searches to generate a style that is appropriate to the user's emotions.
[0581] "Presentation means" refers to a device or method for visually displaying generated style information to the user.
[0582] A "feedback processing method" is a device or method that receives feedback from users and uses it to improve the accuracy of future proposals.
[0583] The system of this invention is designed to suggest individual styles based on the user's emotions. The system, in collaboration with the user, terminal, and server, provides a next-generation personalized experience.
[0584] Users can input facial photos and voice data using their device. The device collects this data and sends it to the server. The device can be a standard smartphone or personal computer.
[0585] The server uses image and audio analysis software to analyze the user's data in detail. The image analysis software has an algorithm for extracting facial feature points, and the audio analysis software analyzes voice tone. This analysis allows the system to recognize the user's emotions.
[0586] Once sentiment data is acquired, the server uses a similarity search function to search the database for past cases. The database records a history of various sentiments and their corresponding styles. This allows the server to identify styles that were effective for users with similar sentiments.
[0587] Next, the server uses a generative AI model to generate appropriate styles. The generated styles are customized based on sentiment data, resulting in the best possible suggestions for the user. Input to the AI model is in the form of prompts, such as "Generate a style suitable for a user who is feeling happy."
[0588] Finally, the server sends the generated style to the terminal. The terminal presents it to the user, who can then provide feedback. This feedback is processed by the server and used to improve the quality of future suggestions.
[0589] This enables more sophisticated and personalized suggestions by integrating visual and auditory information.
[0590] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0591] Step 1:
[0592] The user uses the device to take a photo of their face and record audio, and inputs them as data. This data includes a facial image file in image data format and an audio recording file in audio data format.
[0593] Step 2:
[0594] The terminal sends image and audio data obtained from the user to the server. The data is sent via the internet and received by the server. The output here is the raw image and audio files that arrived on the server.
[0595] Step 3:
[0596] The server starts analyzing the received data by running image analysis software and audio analysis software. The image analysis software identifies facial feature points (e.g., eyes, eyebrows, mouth, etc.) and extracts facial expression information. Meanwhile, the audio analysis software analyzes the tone from the audio and obtains indicators related to emotion. The output is the analyzed feature point data and tone data.
[0597] Step 4:
[0598] The server performs a similarity search based on the obtained sentiment data. It refers to the database to search for past user cases with similar sentiment data and retrieves the results. The output is similar sentiment data and its past style history.
[0599] Step 5:
[0600] The server creates prompt statements to generate appropriate styles using a generative AI model and inputs them into the model. For example, it might use a prompt statement like, "Generate a style suitable for a user who is feeling happy." Generating prompt statements from emotion data and inputting them into the AI model constitutes the input and output here.
[0601] Step 6:
[0602] The generative AI model generates styles based on the prompt text. The output is style data for a customized design.
[0603] Step 7:
[0604] The server sends the generated style data to the terminal. The terminal displays the sent data to the user, presenting it as a visual suggestion.
[0605] Step 8:
[0606] The user enters feedback on the suggested style via a terminal and sends this feedback data to the server. This allows the system to obtain information necessary for improving the accuracy of future suggestions. The output is the feedback information stored on the server.
[0607] (Application Example 2)
[0608] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0609] Traditional hair salons and style suggestion systems often struggled to provide personalized service based on the customer's emotions, resulting in customers mostly choosing from standardized options. This meant that customers couldn't obtain a style that best suited their individual feelings and moods, leading to a lack of a personalized experience.
[0610] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0611] In this invention, the server includes analysis means for analyzing image data and audio data, similarity search means for referencing similar emotion cases from a past data set based on the analyzed emotion data, and generation means for generating an appropriate appearance style based on the referenced similar emotion cases. This makes it possible to suggest a personalized style based on the user's emotions.
[0612] "Image data" is a collection of digital information used to provide visual information.
[0613] "Audio data" refers to digital information used to provide auditory stimuli.
[0614] "Analysis means" refers to methods or devices for detecting data and identifying its features and patterns.
[0615] "Emotional data" refers to information about a person's mental state that can be inferred from their facial expressions and tone of voice.
[0616] An "information collection" is a reference database containing past data and case studies.
[0617] A "similarity search method" is a method or device for finding items that have commonalities from existing data.
[0618] "Appearance style" refers to suggestions for hairstyles and clothing that determine how a person looks.
[0619] "Generative means" refers to methods and devices for creating new content or suggestions based on the results of analysis or retrieval.
[0620] A "generative AI model" is an algorithm or program that uses artificial intelligence to automatically generate results based on data.
[0621] A "feedback processing method" refers to a method or device for receiving opinions and feedback from users and using them to improve future services.
[0622] A "recognition algorithm" is a mathematical method used to analyze data and identify specific patterns or features from it.
[0623] In this invention, a server and a terminal work together to realize a system that integrates emotion recognition and style suggestion. The server utilizes analysis software using Python and libraries such as OpenCV and Librosa to analyze image and audio data. Based on the emotion data obtained from this data, the server searches for similar emotion cases in a past database. Here, a generative AI model is used to automatically generate an appearance style that is appropriate for the user's emotion.
[0624] The device acts as a receiver, sending image and audio data entered by the user to a server and displaying the generated appearance style. This allows the user to receive style suggestions that match their mood.
[0625] For example, when a user smiles at the tablet's camera, the device sends the image to the server. At the same time, if the user says, "I want a hairstyle that makes me look cheerful today," the server analyzes this voice. It then determines that the user is expressing happiness and, by referring to related past data, suggests a glamorous curly hairstyle, for instance. The generative AI model used in this process leverages the relationship between trained emotion data and appearance styles.
[0626] An example of a prompt message is, "Suggest the best hairstyle for a user who is smiling and cheerfully asking, 'What kind of hairstyle would suit me?'" This allows the user to receive personalized suggestions tailored to their individual emotions, resulting in a more satisfying experience.
[0627] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0628] Step 1:
[0629] The device acquires image and audio data entered by the user. This data is collected in real time using the camera and microphone. The input consists of user image and audio data, which are temporarily stored on the device.
[0630] Step 2:
[0631] The terminal transmits the acquired image and audio data to the server. The transmitted data is then input into the server's data analysis system. This process of data preparation and transmission enables real-time analysis.
[0632] Step 3:
[0633] The server processes the transmitted image data using the OpenCV library and performs analysis to extract the user's facial feature points from the image. This determines the user's facial expression and generates emotion data.
[0634] Step 4:
[0635] The server uses the Librosa library to analyze audio data. It estimates the user's emotions from the tone and pitch of the voice, and uses this to further reinforce the emotion data. The input is audio data, and the output is emotion data based on the audio.
[0636] Step 5:
[0637] The server integrates emotion data obtained from image and audio data, and searches for similar emotion cases by comparing them with past databases. Here, the integrated emotion data is the input, and the information of similar cases is the output.
[0638] Step 6:
[0639] The server uses a generative AI model to generate the most suitable appearance style based on similar sentiment cases. This model learns the relationship between past sentiment data and style suggestions, and outputs the best suggestion for the input sentiment.
[0640] Step 7:
[0641] The server sends an image of the generated appearance style to the terminal. The terminal presents this proposed image to the user, allowing for visual evaluation. Here, the generated appearance style is sent to the terminal as input and displayed to the user.
[0642] Step 8:
[0643] Users input feedback on the presented appearance style via a terminal. This feedback is sent back to the server and used to improve the accuracy of future suggestions in the database. Here, the user's reactions and impressions are input, and feedback information is output.
[0644] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0645] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0646] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0647] [Fourth Embodiment]
[0648] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0649] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0650] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0651] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0652] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0653] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0654] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0655] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0656] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0657] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0658] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0659] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0660] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0661] The system of this invention allows users to upload their own photos, and the server then uses advanced image analysis and AI technology to suggest hairstyles that are suitable for the user.
[0662] When a user uploads their photo to the system via their device, the server receives the photo. After receiving the photo, image analysis tools are used to extract facial and body features, and facial recognition algorithms are used in particular to identify facial contours and feature points. The feature data obtained here is then compared with a past database using a similarity search tool to identify similar cases that closely resemble the user.
[0663] Next, the server uses a generation mechanism to leverage AI based on similar cases and generates the optimal hairstyle for the user. The generated hairstyle is sent from the server to the terminal and visually presented to the user by a display mechanism.
[0664] Furthermore, users can send feedback about the suggested hairstyles to the server via their device. This feedback is then processed by a feedback processing system and used to improve the accuracy of future hairstyle suggestions.
[0665] For example, if a woman in her 20s with a round face and slightly plump cheeks uploads a photo, the server captures these features through image analysis and searches for past data with similar characteristics. Based on highly-rated styles, the AI suggests a bob haircut style, and that image is presented to the user. If the user likes this style and provides feedback, that opinion will be used to improve future suggestions.
[0666] This system allows users to efficiently find the hairstyle that best suits them, while simultaneously creating a feedback loop for continuous improvement.
[0667] The following describes the processing flow.
[0668] Step 1:
[0669] The user launches the application on their device, selects a photo, and presses the upload button. It is recommended that the user choose a photo that clearly shows their face and hair features.
[0670] Step 2:
[0671] The device sends the photos selected by the user to the server. Before transmission, data encryption and other measures are taken to ensure data security.
[0672] Step 3:
[0673] The server receives photo data sent from the terminal. The received data is passed to an image analysis module, which uses a face recognition algorithm to detect the contours of faces and bodies.
[0674] Step 4:
[0675] The server generates feature vectors for the user's face and body from the facial contours and feature points, and then uses these vectors to perform a similarity search on the past database.
[0676] Step 5:
[0677] The server generates the optimal hairstyle using a generation method based on similar cases obtained through similarity searches. AI technology is used to select a style that suits the user by referencing past highly-rated styles.
[0678] Step 6:
[0679] The server sends the generated hairstyle image to the terminal.
[0680] Step 7:
[0681] The terminal receives hairstyle images sent from the server and displays them to the user. The user can visually confirm the suggested style by viewing the images.
[0682] Step 8:
[0683] Users can provide feedback on the presented style. For example, they can select whether they like the suggested style.
[0684] Step 9:
[0685] The terminal sends user feedback to the server. This feedback is stored on the server to improve the quality of future suggestions.
[0686] (Example 1)
[0687] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0688] In modern society, there is a growing demand for technologies that reduce the effort required for individual users to select the style and appearance best suited to them, and that more efficiently meet specific needs. In particular, the challenge lies in automatically and accurately suggesting hairstyles that fit an individual's face shape and contours.
[0689] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0690] In this invention, the server includes data receiving processing means, image processing means, and similar information retrieval means. This enables personalized style suggestions based on images uploaded by the user.
[0691] A "data reception processing means" is a means equipped with the function of receiving image data provided by a user and temporarily storing it within the system.
[0692] "Image processing means" refers to means for analyzing features related to the face and its shape from received image data and extracting specific information.
[0693] A "similar information retrieval method" is a means of identifying similar cases from past cases or related information sets within a database, based on extracted characteristic information.
[0694] "Generation processing means" refers to means having an algorithm for generating suitable images and styles corresponding to individual characteristics based on similar cases.
[0695] An "information presentation means" is a means that has the function of transmitting data to an external display device or the like in order to visually present the generated image information to the user.
[0696] "Feedback processing methods" refer to means of obtaining feedback and opinions from users and using that information to improve the accuracy and functionality of future system proposals.
[0697] A "facial feature recognition algorithm" is an algorithm used to identify the characteristic points of a face and to distinguish the details of its shape, including its contours and facial expressions.
[0698] This invention's system suggests the most suitable hairstyle for a user based on their uploaded photo. The user begins by uploading a photo of their face to the system using a terminal. The terminal then transfers this image data to a server.
[0699] The server first receives the uploaded image using a data reception processing device and stores it in the system's temporary file storage. Then, it uses image analysis software (e.g., OpenCV, dlib) as an image processing device to extract facial and shape features from the photograph. In image processing, the contours and feature points of the user's face are detected with high accuracy, forming feature information that forms the basis for database searches.
[0700] Next, a similarity information retrieval method is used to search for similar cases in a database stored based on the extracted feature information. Here, a machine learning library (e.g., FAISS) is used to perform a search based on similarity. These cases with high similarity become the basic data used in the generation processing method.
[0701] The generation process uses a generative AI model (e.g., GAN, Transformers). This AI model creates the optimal hairstyle by constructing and inputting prompt sentences based on similar cases. These prompt sentences include specific requests such as, "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s."
[0702] The generated hairstyle image is transmitted to the user's terminal via an information display device. The terminal visually presents this to the user, allowing the user to confirm it.
[0703] Finally, users can provide feedback on the suggested hairstyles. This feedback is received by the server via an opinion processing mechanism, reflected in the database to improve the accuracy of future suggestions, and also used to train the AI model.
[0704] Through the above process, users can find hairstyles that suit their own features, and the system can continue to evolve based on user feedback.
[0705] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0706] Step 1:
[0707] The user uses a device to upload a photo of their face to the system. The input data is a facial image, and this data is sent from the device to the server. The server receives this data and stores it in temporary file storage. This step involves the user providing initial data.
[0708] Step 2:
[0709] The server uses data receiving and processing to organize uploaded images and prepare them for image analysis. The input data is a facial photograph sent by the user, and the output is image data ready for analysis. In this step, the necessary data processing is performed to ensure that the images are accurately analyzed.
[0710] Step 3:
[0711] The server uses image processing tools to extract facial and shape features from the received images. Image analysis software such as OpenCV or dlib is used. The input is stored image data, and the output is feature information including facial contours and feature points. In this step, a face recognition algorithm is executed to obtain facial shape data.
[0712] Step 4:
[0713] The server uses a similarity information retrieval method to search the database for similar cases based on the extracted feature information. It calculates similarity using machine learning libraries such as FAISS. The input is feature information, and the output is a list of similar cases. In this step, highly relevant information is extracted based on the existing database.
[0714] Step 5:
[0715] The server uses a generative AI model as a generation process, constructing prompt sentences based on similar cases and using them as input to generate the optimal hairstyle. The prompt sentences are specific, such as "Please suggest a stylish hairstyle that suits a round face of a woman in her 20s." The input is a list of similar cases, and the output is an image of the generated hairstyle. In this step, the capabilities of AI are utilized to create a personalized style.
[0716] Step 6:
[0717] The server sends the generated hairstyle image to the terminal and displays it to the user through an information display device. The input is the image information of the generated hairstyle, and the output is the visual information displayed on the terminal. In this step, processing is performed to allow the user to actually confirm the suggested result.
[0718] Step 7:
[0719] The user uses their device to send feedback on the suggested hairstyle to the server. The input is the user's feedback information, and the output is opinion data that is reflected on the server. In this step, the user's opinions and feedback are collected for future improvements.
[0720] Step 8:
[0721] The server uses feedback processing tools to store the received feedback in a database and uses it to train the AI model. The input is feedback data, and the output is an improved proposed algorithm used for training. In this step, the system processes information from the user to improve itself.
[0722] (Application Example 1)
[0723] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0724] In recent years, there has been a growing demand for personal assistant systems that provide optimal suggestions to individual users. However, conventional suggestion systems often lack convenience because users have to manually input information. Furthermore, real-time information processing and suggestions tailored to the user's environment have been difficult. This invention aims to solve these problems and enable a personal assistant robot to instantly provide users with optimal information within their homes.
[0725] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0726] In this invention, the server includes image processing means for analyzing the shape of the face and body, similarity search means for referencing similar cases from a past information set based on the analyzed features, and generation means for creating an appropriate hairstyle based on the similar cases. As a result, users can obtain information tailored to them in real time simply by giving instructions to the robot.
[0727] "Facial and body shape" refers to the basic outlines and characteristics of a person's appearance, and these can be used to analyze each person's traits.
[0728] "Image processing means" refers to a device or technology for extracting specific information from input image data and performing analysis or recognition.
[0729] "Similar cases" refers to a collection of cases extracted from past information sets that share similar characteristics or features.
[0730] "Generative means" refers to methods or devices for creating new data or information based on analysis results.
[0731] "Visual information" refers to visual data used to convey analysis results and products to the user, i.e., information presented in the form of images and videos.
[0732] "Photography means" refers to devices or methods for acquiring images of a user or object using optical devices or the like.
[0733] "Communication methods" refer to the processes and technologies used to transmit data or information from one point to another.
[0734] A "feedback processing means" is a device or process for analyzing opinions received from users and utilizing them to improve the system.
[0735] An "image recognition algorithm" is a set of computational procedures and logic used to extract and identify specific patterns or features from digital images.
[0736] The system required to realize this invention uses a robot that acts as a beauty advisor for the user in the home. The robot is equipped with a high-quality camera for taking pictures and network capabilities for data communication. When the user instructs the robot to take a picture, the robot uses its built-in camera to photograph the user's face and sends the image to a server in the cloud.
[0737] The server uses image processing software (e.g., OpenCV or TensorFlow) to analyze images and identify the shape of the user's face and body. Based on the feature data extracted through this analysis, a similarity search tool compares it with past databases to select similar cases. Based on the selected cases, the server uses AI to generate the optimal hairstyle. This generation uses a generative AI model, and the generation is based on a prompt message such as, "Analyze the user's face photo and suggest the best hairstyle for a party."
[0738] The generated visual information, i.e., the suggested hairstyle, is presented to the user through the robot. When the user provides feedback on the style to the robot, that feedback is sent back to the server and used by the feedback processing system to improve the accuracy of the next suggestion.
[0739] This system allows users to receive appropriate beauty advice in real time via a robot without having to perform complicated operations. Specific approaches include scenarios such as receiving suggestions for the best hairstyle for a weekend party.
[0740] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0741] Step 1:
[0742] The user gives instructions to the robot and requests that it take a picture of their face. The input is the user's instructions, and the output is an image of the user's face taken by the camera. The robot uses its built-in camera to take a high-resolution picture of the user's face.
[0743] Step 2:
[0744] The device transmits the captured facial image to the server via the internet. The input is the captured facial image, and the output is the completion of the transfer of the image data to the server. It uses network communication capabilities to ensure stable data transmission.
[0745] Step 3:
[0746] The server analyzes the received image data using image processing software. The input is a face image sent to the server. The output is shape data of the user's face and body. Data calculations are performed using OpenCV or TensorFlow to extract facial contours and feature points.
[0747] Step 4:
[0748] The server compares the analysis results with past data sets using a similarity search mechanism. The input is user shape data, and the output is the most similar past examples. This includes data processing operations that perform appropriate comparisons and selections from the database.
[0749] Step 5:
[0750] The server uses a generative AI model to generate the optimal hairstyle for the user. The input consists of information on similar cases and a prompt, and the output is visual information of the suggested hairstyle. Based on the prompt, "Analyze the user's facial photo and suggest the best hairstyle for a party," the AI generates the optimal style.
[0751] Step 6:
[0752] The terminal displays visual information of hairstyles received from the server to the user. The input is the visual information provided by the server, and the output is the user confirming the displayed hairstyle image. This includes the action of visually displaying the information on the robot's display.
[0753] Step 7:
[0754] The user provides feedback on the presented hairstyle. The input is the user's evaluation of the hairstyle. The output is the feedback sent to the server. The user provides feedback to the robot via voice or touch input.
[0755] Step 8:
[0756] The server processes user feedback and uses it to improve the accuracy of future suggestions. Input is user-provided feedback information, and output is model updates or database optimization. The feedback processing mechanism performs data calculations to improve the AI model.
[0757] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0758] The system incorporating the emotion engine of this invention analyzes facial and voice characteristics using image and audio data provided by the user to recognize the user's emotions. Subsequently, it has the unique function of suggesting the hairstyle most suitable for the user's current situation based on the acquired emotion data.
[0759] The user uploads photos using their device and inputs audio via the microphone as needed. The device sends this data to the server, which performs image and audio analysis in parallel. A facial recognition algorithm analyzes facial features and expressions, and an emotion engine infers the user's emotions based on their expressions and voice tone.
[0760] The server uses the emotion-recognized data to search its past database for examples with similar emotions and characteristics. This generates a hairstyle that matches the user's mood. This generated hairstyle is then sent to the terminal and presented to the user.
[0761] For example, if a user uploads a photo of themselves smiling and records a cheerful voice, the server will sense feelings of happiness and cheerfulness from this data and suggest hairstyles that were previously recommended and well-received by users with similar emotions. For instance, a style with glamorous curls might be generated and displayed on the user's device.
[0762] During the feedback phase, users can send their thoughts on the suggested styles from their device to the server. The feedback processing system records this information and uses it to improve the accuracy of future suggestions, thereby providing an even more personalized experience.
[0763] By utilizing an emotion engine in this way, it becomes possible to realize a next-generation system that leverages both visual and auditory information to provide optimal suggestions to users.
[0764] The following describes the processing flow.
[0765] Step 1:
[0766] The user selects their own photos and records audio through the application on their device. They then use the interface to upload this data and press the "Send" button.
[0767] Step 2:
[0768] The device transmits image and audio data provided by the user to the server. During this process, the data is encrypted to ensure security.
[0769] Step 3:
[0770] The server first passes the image data received from the terminal to an image analysis module, which uses a face recognition algorithm to detect facial features and expressions and identify the contours of the face.
[0771] Step 4:
[0772] The server simultaneously processes the audio data with a voice analysis module, analyzing the tone and intonation of the user's voice. Based on the analysis results, the emotion engine identifies the user's emotions from the facial expression data and audio data.
[0773] Step 5:
[0774] The server searches its past database for cases with similar emotions and features based on recognized emotion data and facial feature vectors. Based on these similar cases, it uses AI technology to generate the optimal hairstyle.
[0775] Step 6:
[0776] The server sends the generated hairstyle image to the terminal. When sending, it also includes the reason why the hairstyle was recommended based on the user's feelings.
[0777] Step 7:
[0778] The device displays received hairstyle images on its screen, providing visual suggestions to the user. Furthermore, it provides an interface for the user to input feedback on the suggestions.
[0779] Step 8:
[0780] Users can review the displayed style and send feedback from their device if they like it or would like to see improvements.
[0781] Step 9:
[0782] The terminal sends user feedback to the server, which records and analyzes it using a feedback processing system. This data is then incorporated into future suggestions, contributing to improved accuracy.
[0783] (Example 2)
[0784] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0785] Current style suggestion systems do not adequately provide personalized suggestions that take user emotions into account. Therefore, it is difficult to provide styles that suit the user's mindset and feelings, often resulting in low satisfaction with the suggestions.
[0786] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0787] In this invention, the server includes data analysis means for analyzing facial and voice characteristics, similarity search means for referencing similar cases from a database based on the analyzed emotion data, and generation means for generating appropriate styles based on similar cases. This makes it possible to suggest the optimal style according to each user's emotions.
[0788] "Data analysis means" refers to devices or methods that analyze image and audio data provided by users to extract facial and voice features.
[0789] A "similarity search method" is a device or method that searches for similar cases within a database based on analyzed sentiment data.
[0790] "Generation means" refers to a device or method that uses the results of similarity searches to generate a style that is appropriate to the user's emotions.
[0791] "Presentation means" refers to a device or method for visually displaying generated style information to the user.
[0792] A "feedback processing method" is a device or method that receives feedback from users and uses it to improve the accuracy of future proposals.
[0793] The system of this invention is designed to suggest individual styles based on the user's emotions. The system, in collaboration with the user, terminal, and server, provides a next-generation personalized experience.
[0794] Users can input facial photos and voice data using their device. The device collects this data and sends it to the server. The device can be a standard smartphone or personal computer.
[0795] The server uses image and audio analysis software to analyze the user's data in detail. The image analysis software has an algorithm for extracting facial feature points, and the audio analysis software analyzes voice tone. This analysis allows the system to recognize the user's emotions.
[0796] Once sentiment data is acquired, the server uses a similarity search function to search the database for past cases. The database records a history of various sentiments and their corresponding styles. This allows the server to identify styles that were effective for users with similar sentiments.
[0797] Next, the server uses a generative AI model to generate appropriate styles. The generated styles are customized based on sentiment data, resulting in the best possible suggestions for the user. Input to the AI model is in the form of prompts, such as "Generate a style suitable for a user who is feeling happy."
[0798] Finally, the server sends the generated style to the terminal. The terminal presents it to the user, who can then provide feedback. This feedback is processed by the server and used to improve the quality of future suggestions.
[0799] This enables more sophisticated and personalized suggestions by integrating visual and auditory information.
[0800] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0801] Step 1:
[0802] The user uses the device to take a photo of their face and record audio, and inputs them as data. This data includes a facial image file in image data format and an audio recording file in audio data format.
[0803] Step 2:
[0804] The terminal sends image and audio data obtained from the user to the server. The data is sent via the internet and received by the server. The output here is the raw image and audio files that arrived on the server.
[0805] Step 3:
[0806] The server starts analyzing the received data by running image analysis software and audio analysis software. The image analysis software identifies facial feature points (e.g., eyes, eyebrows, mouth, etc.) and extracts facial expression information. Meanwhile, the audio analysis software analyzes the tone from the audio and obtains indicators related to emotion. The output is the analyzed feature point data and tone data.
[0807] Step 4:
[0808] The server performs a similarity search based on the obtained sentiment data. It refers to the database to search for past user cases with similar sentiment data and retrieves the results. The output is similar sentiment data and its past style history.
[0809] Step 5:
[0810] The server creates prompt statements to generate appropriate styles using a generative AI model and inputs them into the model. For example, it might use a prompt statement like, "Generate a style suitable for a user who is feeling happy." Generating prompt statements from emotion data and inputting them into the AI model constitutes the input and output here.
[0811] Step 6:
[0812] The generative AI model generates styles based on the prompt text. The output is style data for a customized design.
[0813] Step 7:
[0814] The server sends the generated style data to the terminal. The terminal displays the sent data to the user, presenting it as a visual suggestion.
[0815] Step 8:
[0816] The user enters feedback on the suggested style via a terminal and sends this feedback data to the server. This allows the system to obtain information necessary for improving the accuracy of future suggestions. The output is the feedback information stored on the server.
[0817] (Application Example 2)
[0818] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0819] Traditional hair salons and style suggestion systems often struggled to provide personalized service based on the customer's emotions, resulting in customers mostly choosing from standardized options. This meant that customers couldn't obtain a style that best suited their individual feelings and moods, leading to a lack of a personalized experience.
[0820] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0821] In this invention, the server includes analysis means for analyzing image data and audio data, similarity search means for referencing similar emotion cases from a past data set based on the analyzed emotion data, and generation means for generating an appropriate appearance style based on the referenced similar emotion cases. This makes it possible to suggest a personalized style based on the user's emotions.
[0822] "Image data" is a collection of digital information used to provide visual information.
[0823] "Audio data" refers to digital information used to provide auditory stimuli.
[0824] "Analysis means" refers to methods or devices for detecting data and identifying its features and patterns.
[0825] "Emotional data" refers to information about a person's mental state that can be inferred from their facial expressions and tone of voice.
[0826] An "information collection" is a reference database containing past data and case studies.
[0827] A "similarity search method" is a method or device for finding items that have commonalities from existing data.
[0828] "Appearance style" refers to suggestions for hairstyles and clothing that determine how a person looks.
[0829] "Generative means" refers to methods and devices for creating new content or suggestions based on the results of analysis or retrieval.
[0830] A "generative AI model" is an algorithm or program that uses artificial intelligence to automatically generate results based on data.
[0831] A "feedback processing method" refers to a method or device for receiving opinions and feedback from users and using them to improve future services.
[0832] A "recognition algorithm" is a mathematical method used to analyze data and identify specific patterns or features from it.
[0833] In this invention, a server and a terminal work together to realize a system that integrates emotion recognition and style suggestion. The server utilizes analysis software using Python and libraries such as OpenCV and Librosa to analyze image and audio data. Based on the emotion data obtained from this data, the server searches for similar emotion cases in a past database. Here, a generative AI model is used to automatically generate an appearance style that is appropriate for the user's emotion.
[0834] The device acts as a receiver, sending image and audio data entered by the user to a server and displaying the generated appearance style. This allows the user to receive style suggestions that match their mood.
[0835] For example, when a user smiles at the tablet's camera, the device sends the image to the server. At the same time, if the user says, "I want a hairstyle that makes me look cheerful today," the server analyzes this voice. It then determines that the user is expressing happiness and, by referring to related past data, suggests a glamorous curly hairstyle, for instance. The generative AI model used in this process leverages the relationship between trained emotion data and appearance styles.
[0836] An example of a prompt message is, "Suggest the best hairstyle for a user who is smiling and cheerfully asking, 'What kind of hairstyle would suit me?'" This allows the user to receive personalized suggestions tailored to their individual emotions, resulting in a more satisfying experience.
[0837] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0838] Step 1:
[0839] The device acquires image and audio data entered by the user. This data is collected in real time using the camera and microphone. The input consists of user image and audio data, which are temporarily stored on the device.
[0840] Step 2:
[0841] The terminal transmits the acquired image and audio data to the server. The transmitted data is then input into the server's data analysis system. This process of data preparation and transmission enables real-time analysis.
[0842] Step 3:
[0843] The server processes the transmitted image data using the OpenCV library and performs analysis to extract the user's facial feature points from the image. This determines the user's facial expression and generates emotion data.
[0844] Step 4:
[0845] The server uses the Librosa library to analyze audio data. It estimates the user's emotions from the tone and pitch of the voice, and uses this to further reinforce the emotion data. The input is audio data, and the output is emotion data based on the audio.
[0846] Step 5:
[0847] The server integrates emotion data obtained from image and audio data, and searches for similar emotion cases by comparing them with past databases. Here, the integrated emotion data is the input, and the information of similar cases is the output.
[0848] Step 6:
[0849] The server uses a generative AI model to generate the most suitable appearance style based on similar sentiment cases. This model learns the relationship between past sentiment data and style suggestions, and outputs the best suggestion for the input sentiment.
[0850] Step 7:
[0851] The server sends an image of the generated appearance style to the terminal. The terminal presents this proposed image to the user, allowing for visual evaluation. Here, the generated appearance style is sent to the terminal as input and displayed to the user.
[0852] Step 8:
[0853] Users input feedback on the presented appearance style via a terminal. This feedback is sent back to the server and used to improve the accuracy of future suggestions in the database. Here, the user's reactions and impressions are input, and feedback information is output.
[0854] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0855] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0856] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0857] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0858] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0859] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0860] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0861] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0862] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0863] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0864] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0865] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0866] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0867] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0868] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0869] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0870] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0871] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0872] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0873] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0874] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0875] The following is further disclosed regarding the embodiments described above.
[0876] (Claim 1)
[0877] Image analysis means for analyzing the contours of the face and body,
[0878] A similarity search method that references similar cases from past databases based on the analyzed features,
[0879] A generation means for generating an appropriate hairstyle based on similar cases,
[0880] A display means for presenting an image of the generated hairstyle,
[0881] A system that includes this.
[0882] (Claim 2)
[0883] The system according to claim 1, comprising a feedback processing means for receiving user feedback and improving the next proposal based on that feedback.
[0884] (Claim 3)
[0885] The system according to claim 1, which uses a face recognition algorithm to identify facial feature points in facial and body contour analysis.
[0886]
[0887] "Example 1"
[0888] (Claim 1)
[0889] A data reception processing means for receiving and storing image data,
[0890] Image processing means for extracting facial and shape features,
[0891] A similar information search means for identifying similar cases from a set of related information based on extracted feature information,
[0892] A generation processing means having a generation algorithm for generating a suitable image based on similar cases,
[0893] Information presentation means for transmitting and displaying generated image information to an external device,
[0894] A system that includes this.
[0895] (Claim 2)
[0896] The system according to claim 1, comprising a means for processing user opinions and for refining future proposal information based on that information.
[0897] (Claim 3)
[0898] The system according to claim 1, which uses a facial feature recognition algorithm to identify characteristic points of the face and shape.
[0899] "Application Example 1"
[0900] (Claim 1)
[0901] Image processing means for analyzing the shape of the face and body,
[0902] A similarity search method that references similar cases from past information sets based on analyzed features,
[0903] A generation means for creating an appropriate hairstyle based on similar cases,
[0904] A display means for presenting visual information of the created hairstyle,
[0905] A means of taking pictures according to the user's instructions,
[0906] A communication means for transmitting captured images and analysis results,
[0907] A system that includes this.
[0908] (Claim 2)
[0909] The system according to claim 1, comprising a feedback processing means for receiving user feedback and improving the next creation based on that feedback.
[0910] (Claim 3)
[0911] The system according to claim 1, which uses an image recognition algorithm to identify feature points of a face in the analysis of the shape of a face or body.
[0912] "Example 2 of combining an emotion engine"
[0913] (Claim 1)
[0914] Data analysis methods for analyzing facial and vocal characteristics,
[0915] A similarity search method that references similar cases from a database based on analyzed sentiment data,
[0916] A generation means for generating an appropriate style based on similar cases,
[0917] A presentation means for displaying information about the generated style,
[0918] A system that includes this.
[0919] (Claim 2)
[0920] The system according to claim 1, comprising a feedback processing means for receiving user feedback and improving future proposals based on that feedback.
[0921] (Claim 3)
[0922] The system according to claim 1, which uses an algorithm for identifying facial feature points in facial and voice feature analysis.
[0923] "Application example 2 when combining with an emotional engine"
[0924] (Claim 1)
[0925] An analysis means for analyzing image data and audio data,
[0926] A similarity search method that references similar emotional cases from past data sets based on analyzed emotional data,
[0927] A generation means for generating an appropriate appearance style based on referenced similar emotion cases,
[0928] A display means for presenting an image of the generated appearance style,
[0929] A method using a generative AI model to generate appearance styles based on emotional data,
[0930] A system that includes this.
[0931] (Claim 2)
[0932] The system according to claim 1, comprising a feedback processing means for receiving user feedback and improving the next suggestion based on sentiment data.
[0933] (Claim 3)
[0934] The system according to claim 1, which uses a recognition algorithm for identifying facial feature points and voice tone. [Explanation of symbols]
[0935] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. Image analysis means for analyzing the contours of the face and body, A similarity search method that references similar cases from past databases based on the analyzed features, A generation means for generating an appropriate hairstyle based on similar cases, A display means for presenting an image of the generated hairstyle, A system that includes this.
2. The system according to claim 1, further comprising a feedback processing means for receiving user feedback and improving the next proposal based on that feedback.
3. The system according to claim 1, which uses a face recognition algorithm to identify facial feature points in facial and body contour analysis.