system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system addresses inefficiencies in dirt recognition and cleaning method selection by using image acquisition, recognition, and learning mechanisms, providing tailored and emotionally adaptive cleaning solutions.

JP2026100610APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-09
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

Current technologies fail to accurately recognize dirt types and provide appropriate cleaning methods, leading to inefficient and burdensome cleaning processes.

Method used

A system that uses image acquisition, dirt recognition, suggestion, and learning mechanisms to identify dirt types and recommend optimal cleaning methods, with real-time feedback integration for improved accuracy.

Benefits of technology

Enables efficient and user-friendly cleaning by accurately identifying dirt and suggesting appropriate methods, adapting to user feedback and emotional states for enhanced cleaning experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026100610000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] Means of acquiring video, A dirt recognition means analyzes the video information received from the aforementioned video acquisition means and recognizes dirt, Based on the dirt recognized by the dirt recognition means, a suggestion means proposes the optimal cleaning method and cleaning tools to be used. An output means for outputting the proposal generated by the aforementioned proposal means as audio, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] It is important to improve the quality of users' lives by reducing the labor of cleaning in homes and for individuals and enabling easy selection of optimal cleaning methods according to different types of dirt. For this, real-time dirt recognition and appropriate proposals for cleaning methods based on it are necessary, but this problem has not been sufficiently solved by current technologies.

Means for Solving the Problems

[0005] This invention provides a technology for automatically determining the type of dirt by acquiring image information of dirt using an image acquisition means and analyzing the image with a dirt recognition means. Furthermore, it generates the optimal cleaning method and tools to be used according to the determined dirt using a suggestion means, and outputs the results in audio, enabling the user to easily perform appropriate cleaning activities. In addition, by collecting and analyzing user feedback using a learning means and improving the accuracy of future suggestions, this invention solves problems related to cleaning.

[0006] "Image acquisition means" refers to a function that uses cameras or sensors to acquire video data of a specific area in real time.

[0007] A "dirt recognition means" is a function that analyzes acquired video data to identify the presence and type of dirt.

[0008] The "suggestion method" is a function that selects and suggests the optimal cleaning method and tools to use based on the type of dirt that has been identified.

[0009] "Output means" refers to a function that communicates proposed cleaning methods and tools to the user through voice or screen.

[0010] "Learning method" refers to a function that updates the parameters of the generative model to improve the content and accuracy of suggestions based on feedback information collected from users. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4]This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0017] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F manages communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] The autonomous cleaning assistant system of the present invention includes a video acquisition means, a dirt recognition means, a suggestion means, an output means, and a learning means. First, the terminal acquires video through a camera installed in the room. This transmits image information of dirt in the room to the server in real time. The server analyzes the video using the dirt recognition means to identify the type and location of the dirt.

[0033] Based on the recognized stain, the server's suggestion system consults a database to determine the most appropriate cleaning method and tools to use. This suggestion incorporates insights gained from previously collected user feedback, using a learning algorithm. The terminal communicates this suggestion to the user via voice output. For example, the terminal might advise, "There is a coffee stain on the floor. Use a cloth to wipe it with water first, then use a specialized cleaner."

[0034] After users try the suggested cleaning methods, they can send feedback to the server via their device. This feedback is accumulated by the server's learning mechanisms and helps improve the accuracy of future suggestions. This system is an intelligent system that autonomously optimizes cleaning through dirt recognition, cleaning activity suggestions, and learning. This will revolutionize household cleaning, enabling efficient and comfortable cleaning tasks.

[0035] The following describes the processing flow.

[0036] Step 1:

[0037] The device activates the room's camera and captures video footage of the room in real time. The captured video data is then sent directly to the server.

[0038] Step 2:

[0039] The server analyzes the received video data. Using a dirt recognition method, it extracts features from the video and performs image analysis to identify the location and type of dirt.

[0040] Step 3:

[0041] The server, based on the analysis results, refers to a database and uses a suggestion mechanism to determine the appropriate cleaning method and tools to be used for each type of stain.

[0042] Step 4:

[0043] The server sends the suggested results to the terminal. The terminal uses a voice output device to provide the user with voice instructions for the cleaning procedure.

[0044] Step 5:

[0045] Users perform cleaning tasks according to instructions on their device. Afterwards, they input feedback on the cleaning evaluation and suggestions via the device.

[0046] Step 6:

[0047] The terminal sends the collected feedback to the server. The server incorporates the feedback into its learning mechanism to update the generative model and improve the accuracy of its suggestions.

[0048] (Example 1)

[0049] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0050] Traditional cleaning systems lacked the means to accurately assess the condition of a room and suggest the most appropriate cleaning method. As a result, cleaning was inefficient, and the burden on users was significant. Furthermore, because the appropriate cleaning method for each type of dirt was unknown, there was a possibility of delayed response or the selection of the wrong method. To solve these problems, there is a need for accurate dirt recognition and the suggestion of effective cleaning methods tailored to the situation.

[0051] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0052] In this invention, the server includes a device for acquiring video, an analysis device for analyzing the image information and identifying contamination, and a suggestion device for suggesting appropriate cleaning techniques and tools to be used. This enables accurate identification of contamination and the suggestion of appropriate cleaning methods based on that identification.

[0053] A "device for acquiring video footage" is a device installed in a room to capture the surrounding environment in detail.

[0054] An "analysis device that analyzes image information to identify contamination" is a computer system that identifies dirt and abnormalities based on received video data.

[0055] A "proposal device that suggests appropriate cleaning techniques and tools to be used" is a mechanism that determines and presents effective cleaning methods and necessary equipment based on information from an analysis device.

[0056] A "learning algorithm, including a training model," is an artificial intelligence technology that uses past data and feedback to improve accuracy.

[0057] A "training device for collecting evaluation information" is a system that collects user feedback based on cleaning results to improve identification capabilities and suggestion accuracy.

[0058] This autonomous cleaning assistant system supports effective cleaning without user intervention. The system mainly consists of a terminal that acquires video, a server that analyzes the video, and a device that suggests cleaning methods and receives feedback.

[0059] First, the camera installed in the device acquires high-resolution video to capture the current situation inside the room. This camera can comprehensively capture the entire room by, for example, using a wide-angle lens. This video data is transmitted to the server in real time.

[0060] Next, the server analyzes the received video data using an enhanced AI algorithm. This AI processing utilizes generative AI models and deep learning to identify dirt and obstacles within the video. During this analysis, the server uses a high-performance processor to rapidly process large amounts of data.

[0061] When dirt is detected, the server accesses a database to determine the most suitable cleaning method and tools for the identified dirt. This process utilizes machine learning algorithms based on historical data and user feedback.

[0062] Ultimately, the device will offer the user specific cleaning instructions via voice or text. For example, "For coffee stains on the floor, wipe them with a damp cloth, then use a specialized cleaner." This allows the user to follow the instructions and clean efficiently.

[0063] An example of a prompt message might be: "Analyze the video footage of the room captured by the camera and identify the dirt. Based on the identified dirt, suggest the most suitable cleaning method and tools." This ensures that the system reliably executes the process and provides the user with the best possible cleaning solution.

[0064] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0065] Step 1:

[0066] The device uses a camera installed in the room to acquire high-resolution video. This video shows the floor, furniture, and other environmental elements, providing a detailed capture of the overall room situation. The input is video data showing the current state of the room. Specific operations include controlling the camera's on / off state and adjusting the video's frame rate.

[0067] Step 2:

[0068] The terminal transmits the acquired video data to the server in real time via the network. During this transmission process, the data is compressed to ensure efficient delivery to the server. The compressed video data is then provided to the server as output.

[0069] Step 3:

[0070] The server analyzes the received video data. Here, generative AI models and deep learning techniques are used to identify dirt and anomalies within the video. Compressed video data is passed to the server as input. Specifically, image processing algorithms perform feature extraction to detect dirt in the data and output it.

[0071] Step 4:

[0072] Based on the identified dirt, the server determines the optimal cleaning method and tools to use from the database. At this stage, a learning algorithm based on historical data and user feedback is referenced. Input includes the type and location of the dirt. Output generates specific cleaning instructions.

[0073] Step 5:

[0074] The device communicates the proposed cleaning method to the user. This communication utilizes speech synthesis and text display technologies to explain the cleaning method step-by-step. This allows the user to specifically understand which tools to use and how to use them. The output provides the user with instructions in either voice or text format.

[0075] (Application Example 1)

[0076] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0077] In modern living environments, quickly and efficiently recognizing dirt and suggesting the most suitable cleaning method to the user is a crucial challenge. However, conventional methods often fail to accurately identify the type and location of dirt, resulting in the selection of inappropriate cleaning methods. A system is needed to solve this problem, reduce cleaning effort, and maintain cleanliness in the living environment.

[0078] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0079] In this invention, the server includes a video acquisition means, a feature recognition means, a suggestion means, and a communication means. This makes it possible to accurately recognize dirt and suggest the optimal cleaning method to the user in real time.

[0080] "Image acquisition means" refers to devices or systems that use cameras or sensors to acquire images of the surrounding environment.

[0081] "Feature recognition means" refers to algorithms or devices that analyze acquired video information to identify specific objects or states.

[0082] A "proposal method" refers to a function or system that presents the user with the optimal action or option based on recognized characteristics.

[0083] "Communication means" refers to networks and devices used to transmit data to other devices or terminals.

[0084] A "generative model" is an algorithm or machine learning model that learns from data and generates new information or results.

[0085] A "learning algorithm" is a series of procedures and calculation methods used to learn patterns from data and make judgments or predictions.

[0086] A "learning mechanism" is a system that uses collected data and feedback to improve the performance and accuracy of the system.

[0087] In embodiments of the present invention, the terminal is equipped with a camera as a video acquisition means to acquire video of the environment in real time. The server receives this video information and uses a feature recognition means to analyze the type and location of the dirt. A generative AI model and a learning algorithm are used for feature recognition, and these determine the characteristics of the dirt.

[0088] Based on the analysis results, the server selects the optimal cleaning method and tools to use via a suggestion mechanism. The suggested content is transmitted to the terminal via a communication mechanism, and the user is notified via voice or text message. During this process, user feedback is collected, and the learning mechanism improves the accuracy of suggestions for subsequent uses.

[0089] For example, if a pet leaves hair on the carpet, the system quickly recognizes the mess with its camera and sends a suggestion to the user such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up." In this scenario, the user solves the problem by following the instructions received.

[0090] An example of a prompt would be, "Please tell me how to identify pet hair scattered on the living room carpet, and also provide suggestions for cleaning it."

[0091] This allows users to clean efficiently and easily maintain a clean living environment.

[0092] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0093] Step 1:

[0094] The device uses a camera to capture the room environment in real time and acquire video data. This video data represents the overall view of the room or the state of a specific area. The acquired video data is then sent to the server.

[0095] Step 2:

[0096] The server analyzes the received video data and uses feature recognition to identify dirt in the video. Data processing here includes processes to remove noise and enhance relevant visual information. Specifically, image processing algorithms are used to determine the location and type of dirt and generate recognition results.

[0097] Step 3:

[0098] Based on the recognized dirt information, the server selects the optimal cleaning method using suggestion tools. This process involves referencing a pre-configured database. The database contains information on cleaning methods and tools to be used, categorized by the type of dirt. The server generates suggestions based on this information.

[0099] Step 4:

[0100] The server sends the generated suggestions to the terminal via a communication method. The terminal notifies the user of the suggestions as voice or text message. For example, it might display specific instructions such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up."

[0101] Step 5:

[0102] The user receives a notification from their device and cleans the dirt using the suggested method. After cleaning, the user sends the results and feedback to the server via their device. This feedback is used by the server to improve the learning accuracy of the suggested method.

[0103] Throughout the process, the server utilizes generative AI models and learning algorithms to continuously improve the accuracy and usability of the system's suggestions based on feedback.

[0104] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0105] The system of the present invention begins by acquiring video footage of a room through a video acquisition means and transmitting it to a server. The server is equipped with a dirt recognition means that analyzes the video data to identify the type and location of dirt in the room. Furthermore, it selects the optimal cleaning method and tools to be used for the dirt identified by the proposed means, transmits this to a terminal, and communicates it to the user via an output means.

[0106] The present invention further incorporates an emotion engine. The terminal sends the user's voice and facial expression data to the emotion engine, which recognizes emotions in real time. Based on this emotion information, the server adjusts the cleaning advice generated by the suggestion means. For example, if the server detects that the user is tired, it recommends a simplified cleaning method or a relaxing task. The terminal guides the user through voice output in a gentle tone, saying something like, "Let's try an easy cleaning method today. First, let's just use the vacuum cleaner to suck up the dust."

[0107] Furthermore, users can provide feedback to the server via their device after performing the suggested cleaning method. This feedback is analyzed by the server's learning mechanisms and reflected in future suggestions. Combined with learning by an emotion engine, the accuracy of the suggestions is improved. This system provides a customized cleaning experience according to the user's current mood and emotional state, achieving both efficiency and comfort in the workplace.

[0108] The following describes the processing flow.

[0109] Step 1:

[0110] The device uses cameras installed in the room to acquire video in real time and transmits that video data to the server.

[0111] Step 2:

[0112] The server analyzes the received video data and uses a dirt recognition system to identify the location and type of dirt.

[0113] Step 3:

[0114] Based on the results of the dirt identification, the server uses suggested methods to determine the optimal cleaning method and tools to be used.

[0115] Step 4:

[0116] The device acquires the user's voice and facial expressions through voice and camera input and sends this data to the emotion engine. The emotion engine analyzes the user's emotions and identifies, for example, levels of fatigue or stress.

[0117] Step 5:

[0118] The server considers the results of the emotion engine and adjusts the decision on suggested actions. For example, if the server detects that the user is tired, it will simplify cleaning methods and suggest less burdensome activities.

[0119] Step 6:

[0120] The device communicates tailored suggestions to the user via voice output. For example, it might say, "Let's keep it simple today. Vacuum up the dust, and if you have time, give it a light wipe."

[0121] Step 7:

[0122] Users carry out suggested cleaning activities and provide feedback on the results via their devices.

[0123] Step 8:

[0124] The device sends user feedback to the server. The server incorporates the feedback and sentiment analysis results into its learning mechanism to improve the accuracy of suggestions and customization capabilities.

[0125] (Example 2)

[0126] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0127] In modern home and office environments, efficient and effective cleaning is crucial, but many people struggle with selecting the right cleaning methods and tools. Furthermore, providing uniform advice without considering the user's emotional state can compromise user satisfaction and effective cleaning results. Additionally, there's a problem with standardized cleaning methods lacking flexibility to adapt to individual situations.

[0128] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0129] In this invention, the server includes an image acquisition means, a recognition means for recognizing contamination, and an adjustment means for adjusting suggestions. This allows for flexible provision of optimal cleaning suggestions while considering the user's feelings, and enables efficient cleaning tailored to individual environments.

[0130] "Image acquisition means" refers to a function or device that captures image data in the target environment and transmits it to a server as input information for analysis.

[0131] "Recognition means" refers to an analytical function that detects contamination within acquired image data and identifies its type and location.

[0132] The "proposal means" is a function that selects the optimal cleaning method and tools to use based on the contamination identified by the recognition means and proposes them to the user.

[0133] "Output means" refers to a function that transmits information about cleaning methods and tools generated by the proposed means to the user in voice or other forms.

[0134] "Emotion recognition means" refers to a function that analyzes the user's emotional state in real time from their voice and facial expressions.

[0135] The "adjustment means" is a function that appropriately adjusts the advice generated by the suggestion means based on the user's emotional information obtained from the emotion recognition means, and provides cleaning suggestions that are appropriate to the user's condition.

[0136] The embodiments for carrying out the present invention are shown below.

[0137] This system is built on a client-server architecture. The terminal is, for example, a portable information terminal with a camera, which acquires images of the room. A standard digital camera can be used. The terminal with the image acquisition means transmits the captured image data to the server via a wireless network.

[0138] The server analyzes the received image data. Image processing libraries such as "OpenCV" are used for image analysis. Specifically, features are extracted from the image to identify the type and location of contamination. Based on the analysis results, the server uses machine learning libraries such as "Scikit-learn" and trained models to select the optimal cleaning method and tools. The cleaning method is optimized based on past data.

[0139] The selected proposals are sent to the terminal and communicated to the user using an audio output device. Text-to-speech software such as "Amazon Polly" is used for the audio output, providing clear and easy-to-understand guidance to the user.

[0140] Furthermore, the device uses a camera and microphone to recognize the user's voice and facial expressions. Emotion recognition incorporates emotion analysis services such as "Microsoft® Azure® Emotion API." This allows the server to generate cleaning suggestions that take the user's emotional state into account. For example, if the system detects that the user is tired, adjustments such as simplifying the cleaning method will be made.

[0141] After cleaning is complete, the user sends feedback on the suggested cleaning method to the server via their device. This feedback information is analyzed using the server's learning mechanisms and used to improve the accuracy of future suggestions.

[0142] A specific example of a prompt: "Analyze the video footage of the room captured by the camera, identify the level of dirt, and suggest the optimal cleaning method based on the user's current mood."

[0143] This format allows users to receive flexible cleaning suggestions tailored to their individual environment and emotional state, thereby improving the efficiency of cleaning work and user satisfaction.

[0144] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0145] Step 1:

[0146] The device acquires images of the room using a camera and transmits that image data to a server via a wireless network. The input is the image data captured by the camera, and the output is the transmission of that image data to the server via the network. Specifically, the device is configured to capture images of the room at regular intervals.

[0147] Step 2:

[0148] The server analyzes the received image data. It receives image data as input, uses "OpenCV" to detect contamination within the image, and identifies its type and location. The output is information about the type and location of the contamination. Specifically, it converts the image to grayscale and performs edge detection and thresholding to highlight the contaminated areas.

[0149] Step 3:

[0150] The server generates suggestions based on the analysis results. The input is analyzed contamination information, and it uses machine learning models such as "Scikit-learn" to select the optimal cleaning method and tools. The output is suggested information regarding cleaning methods and tools. The specific operation includes determining the optimal cleaning method under specific conditions while referencing historical data in the model.

[0151] Step 4:

[0152] The server sends the selected suggestions to the terminal. The input is suggested information about cleaning methods and tools, and the output is this information being sent to the terminal. Specifically, the data is pushed to the terminal via a message queue or API.

[0153] Step 5:

[0154] The terminal communicates suggestions to the user using an audio output device. The input is suggestion information received from the server, and the output is a notification to the user as an audio message. Specifically, it uses speech synthesis software such as "Amazon Polly" to convert the suggestion content into speech and play it back.

[0155] Step 6:

[0156] The device inputs the user's voice and facial expressions into the emotion recognition system. The input consists of user voice and facial expression data acquired by the camera and microphone, and the output is sending that data to an emotion analysis tool. Specifically, it uses the "Microsoft Azure Emotion API" to analyze the user's current emotions.

[0157] Step 7:

[0158] The server adjusts its suggestions based on the sentiment analysis results. The input is the user's sentiment information, and the output is an adjusted cleaning suggestion that takes that information into account. The specific operation includes processes that dynamically adjust the steps and load of the suggestions based on the sentiment information.

[0159] Step 8:

[0160] The user performs the suggested cleaning and provides feedback to the server via a terminal. The input is the user's feedback on the results of the cleaning, and the output is the transmission of that information to the server. For example, the user fills out a feedback form displayed on the terminal and presses the submit button.

[0161] Step 9:

[0162] The server analyzes user feedback information and uses it to improve the accuracy of its suggestions. The input is the feedback information, and the output is the improved suggestion model. The specific operation here is to add the feedback to the dataset and retrain the machine learning model to improve the suggestion accuracy.

[0163] (Application Example 2)

[0164] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0165] In daily life, cleaning tasks require considerable time and effort, yet they present challenges in providing flexible solutions tailored to individual circumstances and emotions. In particular, there is a lack of systems that effectively address room dirt in a way that suits the user's needs. Furthermore, there is a need for technology that provides automated cleaning suggestions that take the user's emotional state into consideration.

[0166] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0167] In this invention, the server includes a device equipped with the function of acquiring video, means for analyzing the video information received from the device and identifying the degree of soiling, and a function for suggesting an appropriate cleaning method and cleaning tools to be used based on the identified degree of soiling. This makes it possible to clean the room efficiently and effectively while providing flexible cleaning suggestions tailored to the user's emotional state.

[0168] A "device equipped with the function of acquiring images" is a device that uses cameras and sensors to collect photographic and video data of a room or space.

[0169] "Methods for identifying the state of dirt" refers to technologies that analyze acquired video footage to determine the type and location of dirt within a room or space.

[0170] The "function that suggests appropriate cleaning methods and cleaning tools" refers to a technology that selects the optimal cleaning procedure and suitable cleaning tools for a given type of soiling and proposes them to the user.

[0171] A "mechanism for communicating proposed cleaning methods via voice output" refers to a device or technology that provides the proposed cleaning procedure to the user as a voice message, and provides instructions and guidance.

[0172] An "engine for recognizing user emotional information" is a technology that analyzes a user's voice tone and visual changes in their facial expressions to grasp their emotional state in real time.

[0173] The "feature that adjusts cleaning suggestions based on emotional information" is a technology that modifies or optimizes suggested cleaning methods and procedures by taking into account the user's current emotional state.

[0174] This system aims to streamline household cleaning and provide a customized experience tailored to the user's emotions. Implementation requires a device to acquire video, typically a consumer robot equipped with a camera. The robot captures real-time images of the room, collecting video data. A server then receives this video data and analyzes it via wireless communication.

[0175] The analysis utilizes the OpenCV video processing library and a learning algorithm that leverages a generative AI model. This allows the server to accurately identify the location and type of dirt from the video. Next, the AI model's suggestion function works to determine the most effective cleaning procedure and the cleaning tools to use. The determined suggestions are communicated to the user via voice through digital voice assistant technology.

[0176] Furthermore, the device incorporates an emotion engine that analyzes the user's voice tone and facial expressions to recognize their emotional state in real time. This emotional information is sent to a server, which adjusts cleaning suggestions to suit the user's situation. For example, if the server determines that the user is tired, it simplifies the suggestions to reduce the user's burden.

[0177] Users can provide feedback after cleaning, and this information is sent to the server. This feedback is analyzed by an algorithm and used to improve future cleaning advice. In this way, the system adapts to the user's current mood and state, pursuing not only cleaning efficiency but also comfort.

[0178] Specific example:

[0179] For example, a consumer robot installed in a living room uses its camera to photograph the floor and detect dust accumulated under the sofa. The server then gives the robot a voice command saying, "Use the vacuum cleaner to suck up the dust under the sofa." If the user is relaxed, after cleaning, it might suggest, "Shall we do a quick clean around the windows next?"

[0180] Examples of prompts for a generative AI model:

[0181] "The smart cleaning system acquires and analyzes video data of the room and suggests cleaning methods tailored to the user's situation. In particular, please provide specific methods for incorporating emotional information into the suggested cleaning methods."

[0182] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0183] Step 1:

[0184] The device uses a camera to capture video footage of the room. It receives real-time video data as input. This video data is transmitted to the server via Wi-Fi. The output is the video data stored on the server.

[0185] Step 2:

[0186] The server extracts specific frames from the received video data and uses video processing libraries such as OpenCV to identify the location and type of dirt for each frame. The input is the transmitted video data, and the output is data on the location and type of the identified dirt.

[0187] Step 3:

[0188] The server uses a generative AI model to calculate the optimal cleaning procedure based on identified dirt and suggests which cleaning tools are most suitable. The input is data on the location and type of dirt, and the output is a suggestion for the cleaning method and tools.

[0189] Step 4:

[0190] The device receives suggestions sent from the server and communicates them to the user as voice output using its digital voice assistant function. The input is the suggestion content, and the output is the voice message heard by the user.

[0191] Step 5:

[0192] The device uses a microphone and camera to collect the user's voice and facial expressions, and sends this data to an emotion engine to analyze the user's emotional state. The input is real-time audio and video data, and the output is data on the identified emotional state.

[0193] Step 6:

[0194] The server receives data on the user's emotional state and adjusts the suggested cleaning methods accordingly. If it determines that the user is tired, it will offer suggestions tailored to the user's condition, such as simplifying the cleaning process. The input is the identified emotional state, and the output is the adjusted suggestions.

[0195] Step 7:

[0196] After cleaning, the user enters feedback via a terminal, which is then sent to the server. The server analyzes the user's feedback using a learning algorithm and uses it as data to improve the accuracy of future suggestions. The input is the user's feedback information, and the output is the improved suggestion algorithm.

[0197] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0198] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0199] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0200] [Second Embodiment]

[0201] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0202] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0203] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0204] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0205] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0206] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0207] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0208] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0209] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0210] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0211] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0212] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0213] The autonomous cleaning assistant system of the present invention includes a video acquisition means, a dirt recognition means, a suggestion means, an output means, and a learning means. First, the terminal acquires video through a camera installed in the room. This transmits image information of dirt in the room to the server in real time. The server analyzes the video using the dirt recognition means to identify the type and location of the dirt.

[0214] Based on the recognized stain, the server's suggestion system consults a database to determine the most appropriate cleaning method and tools to use. This suggestion incorporates insights gained from previously collected user feedback, using a learning algorithm. The terminal communicates this suggestion to the user via voice output. For example, the terminal might advise, "There is a coffee stain on the floor. Use a cloth to wipe it with water first, then use a specialized cleaner."

[0215] After users try the suggested cleaning methods, they can send feedback to the server via their device. This feedback is accumulated by the server's learning mechanisms and helps improve the accuracy of future suggestions. This system is an intelligent system that autonomously optimizes cleaning through dirt recognition, cleaning activity suggestions, and learning. This will revolutionize household cleaning, enabling efficient and comfortable cleaning tasks.

[0216] The following describes the processing flow.

[0217] Step 1:

[0218] The device activates the room's camera and captures video footage of the room in real time. The captured video data is then sent directly to the server.

[0219] Step 2:

[0220] The server analyzes the received video data. Using a dirt recognition method, it extracts features from the video and performs image analysis to identify the location and type of dirt.

[0221] Step 3:

[0222] The server, based on the analysis results, refers to a database and uses a suggestion mechanism to determine the appropriate cleaning method and tools to be used for each type of stain.

[0223] Step 4:

[0224] The server sends the suggested results to the terminal. The terminal uses a voice output device to provide the user with voice instructions for the cleaning procedure.

[0225] Step 5:

[0226] Users perform cleaning tasks according to instructions on their device. Afterwards, they input feedback on the cleaning evaluation and suggestions via the device.

[0227] Step 6:

[0228] The terminal sends the collected feedback to the server. The server incorporates the feedback into its learning mechanism to update the generative model and improve the accuracy of its suggestions.

[0229] (Example 1)

[0230] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0231] Traditional cleaning systems lacked the means to accurately assess the condition of a room and suggest the most appropriate cleaning method. As a result, cleaning was inefficient, and the burden on users was significant. Furthermore, because the appropriate cleaning method for each type of dirt was unknown, there was a possibility of delayed response or the selection of the wrong method. To solve these problems, there is a need for accurate dirt recognition and the suggestion of effective cleaning methods tailored to the situation.

[0232] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0233] In this invention, the server includes a device for acquiring video, an analysis device for analyzing the image information and identifying contamination, and a suggestion device for suggesting appropriate cleaning techniques and tools to be used. This enables accurate identification of contamination and the suggestion of appropriate cleaning methods based on that identification.

[0234] A "device for acquiring video footage" is a device installed in a room to capture the surrounding environment in detail.

[0235] An "analysis device that analyzes image information to identify contamination" is a computer system that identifies dirt and abnormalities based on received video data.

[0236] A "proposal device that suggests appropriate cleaning techniques and tools to be used" is a mechanism that determines and presents effective cleaning methods and necessary equipment based on information from an analysis device.

[0237] A "learning algorithm, including a training model," is an artificial intelligence technology that uses past data and feedback to improve accuracy.

[0238] A "training device for collecting evaluation information" is a system that collects user feedback based on cleaning results to improve identification capabilities and suggestion accuracy.

[0239] This autonomous cleaning assistant system supports effective cleaning without user intervention. The system mainly consists of a terminal that acquires video, a server that analyzes the video, and a device that suggests cleaning methods and receives feedback.

[0240] First, the camera installed in the device acquires high-resolution video to capture the current situation inside the room. This camera can comprehensively capture the entire room by, for example, using a wide-angle lens. This video data is transmitted to the server in real time.

[0241] Next, the server analyzes the received video data using an enhanced AI algorithm. This AI processing utilizes generative AI models and deep learning to identify dirt and obstacles within the video. During this analysis, the server uses a high-performance processor to rapidly process large amounts of data.

[0242] When dirt is detected, the server accesses a database to determine the most suitable cleaning method and tools for the identified dirt. This process utilizes machine learning algorithms based on historical data and user feedback.

[0243] Ultimately, the device will offer the user specific cleaning instructions via voice or text. For example, "For coffee stains on the floor, wipe them with a damp cloth, then use a specialized cleaner." This allows the user to follow the instructions and clean efficiently.

[0244] An example of a prompt message might be: "Analyze the video footage of the room captured by the camera and identify the dirt. Based on the identified dirt, suggest the most suitable cleaning method and tools." This ensures that the system reliably executes the process and provides the user with the best possible cleaning solution.

[0245] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0246] Step 1:

[0247] The device uses a camera installed in the room to acquire high-resolution video. This video shows the floor, furniture, and other environmental elements, providing a detailed capture of the overall room situation. The input is video data showing the current state of the room. Specific operations include controlling the camera's on / off state and adjusting the video's frame rate.

[0248] Step 2:

[0249] The terminal transmits the acquired video data to the server in real time via the network. During this transmission process, the data is compressed to ensure efficient delivery to the server. The compressed video data is then provided to the server as output.

[0250] Step 3:

[0251] The server analyzes the received video data. Here, generative AI models and deep learning techniques are used to identify dirt and anomalies within the video. Compressed video data is passed to the server as input. Specifically, image processing algorithms perform feature extraction to detect dirt in the data and output it.

[0252] Step 4:

[0253] Based on the identified dirt, the server determines the optimal cleaning method and tools to use from the database. At this stage, a learning algorithm based on historical data and user feedback is referenced. Input includes the type and location of the dirt. Output generates specific cleaning instructions.

[0254] Step 5:

[0255] The device communicates the proposed cleaning method to the user. This communication utilizes speech synthesis and text display technologies to explain the cleaning method step-by-step. This allows the user to specifically understand which tools to use and how to use them. The output provides the user with instructions in either voice or text format.

[0256] (Application Example 1)

[0257] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0258] In modern living environments, quickly and efficiently recognizing dirt and suggesting the most suitable cleaning method to the user is a crucial challenge. However, conventional methods often fail to accurately identify the type and location of dirt, resulting in the selection of inappropriate cleaning methods. A system is needed to solve this problem, reduce cleaning effort, and maintain cleanliness in the living environment.

[0259] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0260] In this invention, the server includes a video acquisition means, a feature recognition means, a suggestion means, and a communication means. This makes it possible to accurately recognize dirt and suggest the optimal cleaning method to the user in real time.

[0261] "Image acquisition means" refers to devices or systems that use cameras or sensors to acquire images of the surrounding environment.

[0262] "Feature recognition means" refers to algorithms or devices that analyze acquired video information to identify specific objects or states.

[0263] A "proposal method" refers to a function or system that presents the user with the optimal action or option based on recognized characteristics.

[0264] "Communication means" refers to networks and devices used to transmit data to other devices or terminals.

[0265] A "generative model" is an algorithm or machine learning model that learns from data and generates new information or results.

[0266] A "learning algorithm" is a series of procedures and calculation methods used to learn patterns from data and make judgments or predictions.

[0267] A "learning mechanism" is a system that uses collected data and feedback to improve the performance and accuracy of the system.

[0268] In embodiments of the present invention, the terminal is equipped with a camera as a video acquisition means to acquire video of the environment in real time. The server receives this video information and uses a feature recognition means to analyze the type and location of the dirt. A generative AI model and a learning algorithm are used for feature recognition, and these determine the characteristics of the dirt.

[0269] Based on the analysis results, the server selects the optimal cleaning method and tools to use via a suggestion mechanism. The suggested content is transmitted to the terminal via a communication mechanism, and the user is notified via voice or text message. During this process, user feedback is collected, and the learning mechanism improves the accuracy of suggestions for subsequent uses.

[0270] For example, if a pet leaves hair on the carpet, the system quickly recognizes the mess with its camera and sends a suggestion to the user such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up." In this scenario, the user solves the problem by following the instructions received.

[0271] An example of a prompt would be, "Please tell me how to identify pet hair scattered on the living room carpet, and also provide suggestions for cleaning it."

[0272] This allows users to clean efficiently and easily maintain a clean living environment.

[0273] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0274] Step 1:

[0275] The device uses a camera to capture the room environment in real time and acquire video data. This video data represents the overall view of the room or the state of a specific area. The acquired video data is then sent to the server.

[0276] Step 2:

[0277] The server analyzes the received video data and uses feature recognition means to identify dirt in the video. The data processing here includes a process of removing noise and emphasizing appropriate visual information. Specifically, an image processing algorithm is used to determine the location and type of dirt and generate a recognition result.

[0278] Step 3:

[0279] Based on the recognized dirt information, the server uses proposal means to select an optimal cleaning method. In this process, a pre-set database is referenced. The database stores information on cleaning methods and tools to be used according to the type of dirt. The server generates a proposal based on this information.

[0280] Step 4:

[0281] The server transmits the generated proposal to the terminal via communication means. The terminal notifies the user of the proposal content as voice or text message. For example, specific instructions such as "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to suck it thoroughly" are displayed.

[0282] Step 5:

[0283] The user receives the notification from the terminal and cleans the dirt by the proposed method. After cleaning, the user transmits the results and feedback of the implementation to the server via the terminal. This feedback is used by the server to improve the learning accuracy of the proposal means.

[0284] Throughout, the server utilizes the generated AI model and learning algorithm to continuously improve the proposal accuracy and convenience of the system based on the feedback.

[0285] Furthermore, an emotion engine for estimating the user's emotion may be combined. That is, the specific processing unit 290 may estimate the user's emotion using the emotion specific model 59 and perform specific processing using the user's emotion.

[0286] The system of the present invention starts with acquiring the video of a room through video acquisition means and transmitting it to a server. The server is equipped with dirt recognition means for analyzing the video data, and identifies the type and location of dirt in the room. Further, an optimal cleaning method and the tools to be used for the dirt identified by the proposal means are selected, and this is transmitted to the terminal and conveyed to the user by the output means.

[0287] Furthermore, an emotion engine is incorporated in the present invention. The terminal sends the voice and facial expression data of the user to the emotion engine, and emotions are recognized in real time therefrom. The server adjusts the cleaning advice generated by the proposal means based on this emotion information. For example, when the user is recognized as being tired, the server recommends a simplified cleaning method or a task with a relaxing effect. The terminal guides the user in a gentle tone through voice output, such as "Let's try a way to clean easily today. First, let's just use the vacuum cleaner to suck up the dust."

[0288] Furthermore, after the user executes the proposed cleaning method, the user can provide feedback to the server via the terminal. This feedback is analyzed by the learning means of the server and reflected in the next proposal. Together with the learning by the emotion engine, the proposal accuracy is improved. This system provides a customized cleaning experience according to the current mood and emotional state of the user, achieving both business efficiency and comfort.

[0289] Hereinafter, the processing flow will be described.

[0290] Step 1:

[0291] The terminal uses the camera installed in the room to acquire video in real time and transmits the video data to the server.

[0292] Step 2:

[0293] The server analyzes the received video data and uses a dirt recognition system to identify the location and type of dirt.

[0294] Step 3:

[0295] Based on the results of the dirt identification, the server uses suggested methods to determine the optimal cleaning method and tools to be used.

[0296] Step 4:

[0297] The device acquires the user's voice and facial expressions through voice and camera input and sends this data to the emotion engine. The emotion engine analyzes the user's emotions and identifies, for example, levels of fatigue or stress.

[0298] Step 5:

[0299] The server considers the results of the emotion engine and adjusts the decision on suggested actions. For example, if the server detects that the user is tired, it will simplify cleaning methods and suggest less burdensome activities.

[0300] Step 6:

[0301] The device communicates tailored suggestions to the user via voice output. For example, it might say, "Let's keep it simple today. Vacuum up the dust, and if you have time, give it a light wipe."

[0302] Step 7:

[0303] Users carry out suggested cleaning activities and provide feedback on the results via their devices.

[0304] Step 8:

[0305] The device sends user feedback to the server. The server incorporates the feedback and sentiment analysis results into its learning mechanism to improve the accuracy of suggestions and customization capabilities.

[0306] (Example 2)

[0307] Next, Example 2 will be described. In the following description, the data processing device 12 is referred to as a "server", and the smart glasses 214 are referred to as a "terminal".

[0308] In modern home and office environments, it is important to perform efficient and effective cleaning, but many people have problems in selecting appropriate cleaning methods and tools. In addition, providing uniform advice without considering the emotional state of the user performing the cleaning may damage the user's satisfaction and effective cleaning results. Furthermore, there is a problem that the cleaning method is fixed and lacks flexibility according to individual situations.

[0309] The specific processing by the specific processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0310] In this invention, the server includes an image acquisition means, a recognition means for recognizing contamination, and an adjustment means for adjusting the proposal. Thereby, while considering the user's feelings, an optimal cleaning proposal can be flexibly provided, and efficient cleaning according to the individual environment becomes possible.

[0311] The "image acquisition means" is a function or device that captures image data in the target environment and transmits it to the server as input information for analysis.

[0312] The "recognition means" is an analysis function for detecting contamination in the acquired image data and specifying its type and position.

[0313] The "proposal means" is a function that selects an optimal cleaning method and tools to be used based on the contamination specified by the recognition means and proposes them to the user.

[0314] The "output means" is a function that transmits the cleaning method and tool information generated by the proposal means to the user in voice or other forms.

[0315] "Emotion recognition means" refers to a function that analyzes the user's emotional state in real time from their voice and facial expressions.

[0316] The "adjustment means" is a function that appropriately adjusts the advice generated by the suggestion means based on the user's emotional information obtained from the emotion recognition means, and provides cleaning suggestions that are appropriate to the user's condition.

[0317] The embodiments for carrying out the present invention are shown below.

[0318] This system is built on a client-server architecture. The terminal is, for example, a portable information terminal with a camera, which acquires images of the room. A standard digital camera can be used. The terminal with the image acquisition means transmits the captured image data to the server via a wireless network.

[0319] The server analyzes the received image data. Image processing libraries such as "OpenCV" are used for image analysis. Specifically, features are extracted from the image to identify the type and location of contamination. Based on the analysis results, the server uses machine learning libraries such as "Scikit-learn" and trained models to select the optimal cleaning method and tools. The cleaning method is optimized based on past data.

[0320] The selected proposals are sent to the terminal and communicated to the user using an audio output device. Text-to-speech software such as "Amazon Polly" is used for the audio output, providing clear and easy-to-understand guidance to the user.

[0321] Furthermore, the device uses a camera and microphone to recognize the user's voice and facial expressions. Emotion recognition incorporates emotion analysis services such as the "Microsoft Azure Emotion API." This allows the server to generate cleaning suggestions that take the user's emotional state into account. For example, if the system detects that the user is tired, adjustments such as simplifying the cleaning method will be made.

[0322] After cleaning is complete, the user sends feedback on the suggested cleaning method to the server via their device. This feedback information is analyzed using the server's learning mechanisms and used to improve the accuracy of future suggestions.

[0323] A specific example of a prompt: "Analyze the video footage of the room captured by the camera, identify the level of dirt, and suggest the optimal cleaning method based on the user's current mood."

[0324] This format allows users to receive flexible cleaning suggestions tailored to their individual environment and emotional state, thereby improving the efficiency of cleaning work and user satisfaction.

[0325] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0326] Step 1:

[0327] The device acquires images of the room using a camera and transmits that image data to a server via a wireless network. The input is the image data captured by the camera, and the output is the transmission of that image data to the server via the network. Specifically, the device is configured to capture images of the room at regular intervals.

[0328] Step 2:

[0329] The server analyzes the received image data. It receives image data as input, uses "OpenCV" to detect contamination within the image, and identifies its type and location. The output is information about the type and location of the contamination. Specifically, it converts the image to grayscale and performs edge detection and thresholding to highlight the contaminated areas.

[0330] Step 3:

[0331] The server generates suggestions based on the analysis results. The input is analyzed contamination information, and it uses machine learning models such as "Scikit-learn" to select the optimal cleaning method and tools. The output is suggested information regarding cleaning methods and tools. The specific operation includes determining the optimal cleaning method under specific conditions while referencing historical data in the model.

[0332] Step 4:

[0333] The server sends the selected suggestions to the terminal. The input is suggested information about cleaning methods and tools, and the output is this information being sent to the terminal. Specifically, the data is pushed to the terminal via a message queue or API.

[0334] Step 5:

[0335] The terminal communicates suggestions to the user using an audio output device. The input is suggestion information received from the server, and the output is a notification to the user as an audio message. Specifically, it uses speech synthesis software such as "Amazon Polly" to convert the suggestion content into speech and play it back.

[0336] Step 6:

[0337] The device inputs the user's voice and facial expressions into the emotion recognition system. The input consists of user voice and facial expression data acquired by the camera and microphone, and the output is sending that data to an emotion analysis tool. Specifically, it uses the "Microsoft Azure Emotion API" to analyze the user's current emotions.

[0338] Step 7:

[0339] The server adjusts its suggestions based on the sentiment analysis results. The input is the user's sentiment information, and the output is an adjusted cleaning suggestion that takes that information into account. The specific operation includes processes that dynamically adjust the steps and load of the suggestions based on the sentiment information.

[0340] Step 8:

[0341] The user performs the suggested cleaning and provides feedback to the server via a terminal. The input is the user's feedback on the results of the cleaning, and the output is the transmission of that information to the server. For example, the user fills out a feedback form displayed on the terminal and presses the submit button.

[0342] Step 9:

[0343] The server analyzes user feedback information and uses it to improve the accuracy of its suggestions. The input is the feedback information, and the output is the improved suggestion model. The specific operation here is to add the feedback to the dataset and retrain the machine learning model to improve the suggestion accuracy.

[0344] (Application Example 2)

[0345] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0346] In daily life, cleaning tasks require considerable time and effort, yet they present challenges in providing flexible solutions tailored to individual circumstances and emotions. In particular, there is a lack of systems that effectively address room dirt in a way that suits the user's needs. Furthermore, there is a need for technology that provides automated cleaning suggestions that take the user's emotional state into consideration.

[0347] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0348] In this invention, the server includes a device equipped with the function of acquiring video, means for analyzing the video information received from the device and identifying the degree of soiling, and a function for suggesting an appropriate cleaning method and cleaning tools to be used based on the identified degree of soiling. This makes it possible to clean the room efficiently and effectively while providing flexible cleaning suggestions tailored to the user's emotional state.

[0349] A "device equipped with the function of acquiring images" is a device that uses cameras and sensors to collect photographic and video data of a room or space.

[0350] "Methods for identifying the state of dirt" refers to technologies that analyze acquired video footage to determine the type and location of dirt within a room or space.

[0351] The "function that suggests appropriate cleaning methods and cleaning tools" refers to a technology that selects the optimal cleaning procedure and suitable cleaning tools for a given type of soiling and proposes them to the user.

[0352] A "mechanism for communicating proposed cleaning methods via voice output" refers to a device or technology that provides the proposed cleaning procedure to the user as a voice message, and provides instructions and guidance.

[0353] An "engine for recognizing user emotional information" is a technology that analyzes a user's voice tone and visual changes in their facial expressions to grasp their emotional state in real time.

[0354] The "feature that adjusts cleaning suggestions based on emotional information" is a technology that modifies or optimizes suggested cleaning methods and procedures by taking into account the user's current emotional state.

[0355] This system aims to streamline household cleaning and provide a customized experience tailored to the user's emotions. Implementation requires a device to acquire video, typically a consumer robot equipped with a camera. The robot captures real-time images of the room, collecting video data. A server then receives this video data and analyzes it via wireless communication.

[0356] The analysis utilizes the OpenCV video processing library and a learning algorithm that leverages a generative AI model. This allows the server to accurately identify the location and type of dirt from the video. Next, the AI model's suggestion function works to determine the most effective cleaning procedure and the cleaning tools to use. The determined suggestions are communicated to the user via voice through digital voice assistant technology.

[0357] Furthermore, the device incorporates an emotion engine that analyzes the user's voice tone and facial expressions to recognize their emotional state in real time. This emotional information is sent to a server, which adjusts cleaning suggestions to suit the user's situation. For example, if the server determines that the user is tired, it simplifies the suggestions to reduce the user's burden.

[0358] Users can provide feedback after cleaning, and this information is sent to the server. This feedback is analyzed by an algorithm and used to improve future cleaning advice. In this way, the system adapts to the user's current mood and state, pursuing not only cleaning efficiency but also comfort.

[0359] Specific example:

[0360] For example, a consumer robot installed in a living room uses its camera to photograph the floor and detect dust accumulated under the sofa. The server then gives the robot a voice command saying, "Use the vacuum cleaner to suck up the dust under the sofa." If the user is relaxed, after cleaning, it might suggest, "Shall we do a quick clean around the windows next?"

[0361] Examples of prompts for a generative AI model:

[0362] "The smart cleaning system acquires and analyzes video data of the room and suggests cleaning methods tailored to the user's situation. In particular, please provide specific methods for incorporating emotional information into the suggested cleaning methods."

[0363] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0364] Step 1:

[0365] The device uses a camera to capture video footage of the room. It receives real-time video data as input. This video data is transmitted to the server via Wi-Fi. The output is the video data stored on the server.

[0366] Step 2:

[0367] The server extracts specific frames from the received video data and uses video processing libraries such as OpenCV to identify the location and type of dirt for each frame. The input is the transmitted video data, and the output is data on the location and type of the identified dirt.

[0368] Step 3:

[0369] The server uses a generative AI model to calculate the optimal cleaning procedure based on identified dirt and suggests which cleaning tools are most suitable. The input is data on the location and type of dirt, and the output is a suggestion for the cleaning method and tools.

[0370] Step 4:

[0371] The device receives suggestions sent from the server and communicates them to the user as voice output using its digital voice assistant function. The input is the suggestion content, and the output is the voice message heard by the user.

[0372] Step 5:

[0373] The device uses a microphone and camera to collect the user's voice and facial expressions, and sends this data to an emotion engine to analyze the user's emotional state. The input is real-time audio and video data, and the output is data on the identified emotional state.

[0374] Step 6:

[0375] The server receives data on the user's emotional state and adjusts the suggested cleaning methods accordingly. If it determines that the user is tired, it will offer suggestions tailored to the user's condition, such as simplifying the cleaning process. The input is the identified emotional state, and the output is the adjusted suggestions.

[0376] Step 7:

[0377] After cleaning, the user enters feedback via a terminal, which is then sent to the server. The server analyzes the user's feedback using a learning algorithm and uses it as data to improve the accuracy of future suggestions. The input is the user's feedback information, and the output is the improved suggestion algorithm.

[0378] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0379] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0380] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0381] [Third Embodiment]

[0382] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0383] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0384] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0385] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0386] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0387] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0388] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0389] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0390] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0391] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0392] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0393] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0394] The autonomous cleaning assistant system of the present invention includes a video acquisition means, a dirt recognition means, a suggestion means, an output means, and a learning means. First, the terminal acquires video through a camera installed in the room. This transmits image information of dirt in the room to the server in real time. The server analyzes the video using the dirt recognition means to identify the type and location of the dirt.

[0395] Based on the recognized stain, the server's suggestion system consults a database to determine the most appropriate cleaning method and tools to use. This suggestion incorporates insights gained from previously collected user feedback, using a learning algorithm. The terminal communicates this suggestion to the user via voice output. For example, the terminal might advise, "There is a coffee stain on the floor. Use a cloth to wipe it with water first, then use a specialized cleaner."

[0396] After users try the suggested cleaning methods, they can send feedback to the server via their device. This feedback is accumulated by the server's learning mechanisms and helps improve the accuracy of future suggestions. This system is an intelligent system that autonomously optimizes cleaning through dirt recognition, cleaning activity suggestions, and learning. This will revolutionize household cleaning, enabling efficient and comfortable cleaning tasks.

[0397] The following describes the processing flow.

[0398] Step 1:

[0399] The device activates the room's camera and captures video footage of the room in real time. The captured video data is then sent directly to the server.

[0400] Step 2:

[0401] The server analyzes the received video data. Using a dirt recognition method, it extracts features from the video and performs image analysis to identify the location and type of dirt.

[0402] Step 3:

[0403] The server, based on the analysis results, refers to a database and uses a suggestion mechanism to determine the appropriate cleaning method and tools to be used for each type of stain.

[0404] Step 4:

[0405] The server sends the suggested results to the terminal. The terminal uses a voice output device to provide the user with voice instructions for the cleaning procedure.

[0406] Step 5:

[0407] Users perform cleaning tasks according to instructions on their device. Afterwards, they input feedback on the cleaning evaluation and suggestions via the device.

[0408] Step 6:

[0409] The terminal sends the collected feedback to the server. The server incorporates the feedback into its learning mechanism to update the generative model and improve the accuracy of its suggestions.

[0410] (Example 1)

[0411] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0412] Traditional cleaning systems lacked the means to accurately assess the condition of a room and suggest the most appropriate cleaning method. As a result, cleaning was inefficient, and the burden on users was significant. Furthermore, because the appropriate cleaning method for each type of dirt was unknown, there was a possibility of delayed response or the selection of the wrong method. To solve these problems, there is a need for accurate dirt recognition and the suggestion of effective cleaning methods tailored to the situation.

[0413] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0414] In this invention, the server includes a device for acquiring video, an analysis device for analyzing the image information and identifying contamination, and a suggestion device for suggesting appropriate cleaning techniques and tools to be used. This enables accurate identification of contamination and the suggestion of appropriate cleaning methods based on that identification.

[0415] A "device for acquiring video footage" is a device installed in a room to capture the surrounding environment in detail.

[0416] An "analysis device that analyzes image information to identify contamination" is a computer system that identifies dirt and abnormalities based on received video data.

[0417] A "proposal device that suggests appropriate cleaning techniques and tools to be used" is a mechanism that determines and presents effective cleaning methods and necessary equipment based on information from an analysis device.

[0418] A "learning algorithm, including a training model," is an artificial intelligence technology that uses past data and feedback to improve accuracy.

[0419] A "training device for collecting evaluation information" is a system that collects user feedback based on cleaning results to improve identification capabilities and suggestion accuracy.

[0420] This autonomous cleaning assistant system supports effective cleaning without user intervention. The system mainly consists of a terminal that acquires video, a server that analyzes the video, and a device that suggests cleaning methods and receives feedback.

[0421] First, the camera installed in the device acquires high-resolution video to capture the current situation inside the room. This camera can comprehensively capture the entire room by, for example, using a wide-angle lens. This video data is transmitted to the server in real time.

[0422] Next, the server analyzes the received video data using an enhanced AI algorithm. This AI processing utilizes generative AI models and deep learning to identify dirt and obstacles within the video. During this analysis, the server uses a high-performance processor to rapidly process large amounts of data.

[0423] When dirt is detected, the server accesses a database to determine the most suitable cleaning method and tools for the identified dirt. This process utilizes machine learning algorithms based on historical data and user feedback.

[0424] Ultimately, the device will offer the user specific cleaning instructions via voice or text. For example, "For coffee stains on the floor, wipe them with a damp cloth, then use a specialized cleaner." This allows the user to follow the instructions and clean efficiently.

[0425] An example of a prompt message might be: "Analyze the video footage of the room captured by the camera and identify the dirt. Based on the identified dirt, suggest the most suitable cleaning method and tools." This ensures that the system reliably executes the process and provides the user with the best possible cleaning solution.

[0426] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0427] Step 1:

[0428] The device uses a camera installed in the room to acquire high-resolution video. This video shows the floor, furniture, and other environmental elements, providing a detailed capture of the overall room situation. The input is video data showing the current state of the room. Specific operations include controlling the camera's on / off state and adjusting the video's frame rate.

[0429] Step 2:

[0430] The terminal transmits the acquired video data to the server in real time via the network. During this transmission process, the data is compressed to ensure efficient delivery to the server. The compressed video data is then provided to the server as output.

[0431] Step 3:

[0432] The server analyzes the received video data. Here, generative AI models and deep learning techniques are used to identify dirt and anomalies within the video. Compressed video data is passed to the server as input. Specifically, image processing algorithms perform feature extraction to detect dirt in the data and output it.

[0433] Step 4:

[0434] Based on the identified dirt, the server determines the optimal cleaning method and tools to use from the database. At this stage, a learning algorithm based on historical data and user feedback is referenced. Input includes the type and location of the dirt. Output generates specific cleaning instructions.

[0435] Step 5:

[0436] The device communicates the proposed cleaning method to the user. This communication utilizes speech synthesis and text display technologies to explain the cleaning method step-by-step. This allows the user to specifically understand which tools to use and how to use them. The output provides the user with instructions in either voice or text format.

[0437] (Application Example 1)

[0438] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0439] In modern living environments, quickly and efficiently recognizing dirt and suggesting the most suitable cleaning method to the user is a crucial challenge. However, conventional methods often fail to accurately identify the type and location of dirt, resulting in the selection of inappropriate cleaning methods. A system is needed to solve this problem, reduce cleaning effort, and maintain cleanliness in the living environment.

[0440] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0441] In this invention, the server includes a video acquisition means, a feature recognition means, a suggestion means, and a communication means. This makes it possible to accurately recognize dirt and suggest the optimal cleaning method to the user in real time.

[0442] "Image acquisition means" refers to devices or systems that use cameras or sensors to acquire images of the surrounding environment.

[0443] "Feature recognition means" refers to algorithms or devices that analyze acquired video information to identify specific objects or states.

[0444] A "proposal method" refers to a function or system that presents the user with the optimal action or option based on recognized characteristics.

[0445] "Communication means" refers to networks and devices used to transmit data to other devices or terminals.

[0446] A "generative model" is an algorithm or machine learning model that learns from data and generates new information or results.

[0447] A "learning algorithm" is a series of procedures and calculation methods used to learn patterns from data and make judgments or predictions.

[0448] A "learning mechanism" is a system that uses collected data and feedback to improve the performance and accuracy of the system.

[0449] In embodiments of the present invention, the terminal is equipped with a camera as a video acquisition means to acquire video of the environment in real time. The server receives this video information and uses a feature recognition means to analyze the type and location of the dirt. A generative AI model and a learning algorithm are used for feature recognition, and these determine the characteristics of the dirt.

[0450] Based on the analysis results, the server selects the optimal cleaning method and tools to use via a suggestion mechanism. The suggested content is transmitted to the terminal via a communication mechanism, and the user is notified via voice or text message. During this process, user feedback is collected, and the learning mechanism improves the accuracy of suggestions for subsequent uses.

[0451] For example, if a pet leaves hair on the carpet, the system quickly recognizes the mess with its camera and sends a suggestion to the user such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up." In this scenario, the user solves the problem by following the instructions received.

[0452] An example of a prompt would be, "Please tell me how to identify pet hair scattered on the living room carpet, and also provide suggestions for cleaning it."

[0453] This allows users to clean efficiently and easily maintain a clean living environment.

[0454] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0455] Step 1:

[0456] The device uses a camera to capture the room environment in real time and acquire video data. This video data represents the overall view of the room or the state of a specific area. The acquired video data is then sent to the server.

[0457] Step 2:

[0458] The server analyzes the received video data and uses feature recognition to identify dirt in the video. Data processing here includes processes to remove noise and enhance relevant visual information. Specifically, image processing algorithms are used to determine the location and type of dirt and generate recognition results.

[0459] Step 3:

[0460] Based on the recognized dirt information, the server selects the optimal cleaning method using suggestion tools. This process involves referencing a pre-configured database. The database contains information on cleaning methods and tools to be used, categorized by the type of dirt. The server generates suggestions based on this information.

[0461] Step 4:

[0462] The server sends the generated suggestions to the terminal via a communication method. The terminal notifies the user of the suggestions as voice or text message. For example, it might display specific instructions such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up."

[0463] Step 5:

[0464] The user receives a notification from their device and cleans the dirt using the suggested method. After cleaning, the user sends the results and feedback to the server via their device. This feedback is used by the server to improve the learning accuracy of the suggested method.

[0465] Throughout the process, the server utilizes generative AI models and learning algorithms to continuously improve the accuracy and usability of the system's suggestions based on feedback.

[0466] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0467] The system of the present invention begins by acquiring video footage of a room through a video acquisition means and transmitting it to a server. The server is equipped with a dirt recognition means that analyzes the video data to identify the type and location of dirt in the room. Furthermore, it selects the optimal cleaning method and tools to be used for the dirt identified by the proposed means, transmits this to a terminal, and communicates it to the user via an output means.

[0468] The present invention further incorporates an emotion engine. The terminal sends the user's voice and facial expression data to the emotion engine, which recognizes emotions in real time. Based on this emotion information, the server adjusts the cleaning advice generated by the suggestion means. For example, if the server detects that the user is tired, it recommends a simplified cleaning method or a relaxing task. The terminal guides the user through voice output in a gentle tone, saying something like, "Let's try an easy cleaning method today. First, let's just use the vacuum cleaner to suck up the dust."

[0469] Furthermore, users can provide feedback to the server via their device after performing the suggested cleaning method. This feedback is analyzed by the server's learning mechanisms and reflected in future suggestions. Combined with learning by an emotion engine, the accuracy of the suggestions is improved. This system provides a customized cleaning experience according to the user's current mood and emotional state, achieving both efficiency and comfort in the workplace.

[0470] The following describes the processing flow.

[0471] Step 1:

[0472] The device uses cameras installed in the room to acquire video in real time and transmits that video data to the server.

[0473] Step 2:

[0474] The server analyzes the received video data and uses a dirt recognition system to identify the location and type of dirt.

[0475] Step 3:

[0476] Based on the results of the dirt identification, the server uses suggested methods to determine the optimal cleaning method and tools to be used.

[0477] Step 4:

[0478] The device acquires the user's voice and facial expressions through voice and camera input and sends this data to the emotion engine. The emotion engine analyzes the user's emotions and identifies, for example, levels of fatigue or stress.

[0479] Step 5:

[0480] The server considers the results of the emotion engine and adjusts the decision on suggested actions. For example, if the server detects that the user is tired, it will simplify cleaning methods and suggest less burdensome activities.

[0481] Step 6:

[0482] The device communicates tailored suggestions to the user via voice output. For example, it might say, "Let's keep it simple today. Vacuum up the dust, and if you have time, give it a light wipe."

[0483] Step 7:

[0484] Users carry out suggested cleaning activities and provide feedback on the results via their devices.

[0485] Step 8:

[0486] The device sends user feedback to the server. The server incorporates the feedback and sentiment analysis results into its learning mechanism to improve the accuracy of suggestions and customization capabilities.

[0487] (Example 2)

[0488] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0489] In modern home and office environments, efficient and effective cleaning is crucial, but many people struggle with selecting the right cleaning methods and tools. Furthermore, providing uniform advice without considering the user's emotional state can compromise user satisfaction and effective cleaning results. Additionally, there's a problem with standardized cleaning methods lacking flexibility to adapt to individual situations.

[0490] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0491] In this invention, the server includes an image acquisition means, a recognition means for recognizing contamination, and an adjustment means for adjusting suggestions. This allows for flexible provision of optimal cleaning suggestions while considering the user's feelings, and enables efficient cleaning tailored to individual environments.

[0492] "Image acquisition means" refers to a function or device that captures image data in the target environment and transmits it to a server as input information for analysis.

[0493] "Recognition means" refers to an analytical function that detects contamination within acquired image data and identifies its type and location.

[0494] The "proposal means" is a function that selects the optimal cleaning method and tools to use based on the contamination identified by the recognition means and proposes them to the user.

[0495] "Output means" refers to a function that transmits information about cleaning methods and tools generated by the proposed means to the user in voice or other forms.

[0496] "Emotion recognition means" refers to a function that analyzes the user's emotional state in real time from their voice and facial expressions.

[0497] The "adjustment means" is a function that appropriately adjusts the advice generated by the suggestion means based on the user's emotional information obtained from the emotion recognition means, and provides cleaning suggestions that are appropriate to the user's condition.

[0498] The embodiments for carrying out the present invention are shown below.

[0499] This system is built on a client-server architecture. The terminal is, for example, a portable information terminal with a camera, which acquires images of the room. A standard digital camera can be used. The terminal with the image acquisition means transmits the captured image data to the server via a wireless network.

[0500] The server analyzes the received image data. Image processing libraries such as "OpenCV" are used for image analysis. Specifically, features are extracted from the image to identify the type and location of contamination. Based on the analysis results, the server uses machine learning libraries such as "Scikit-learn" and trained models to select the optimal cleaning method and tools. The cleaning method is optimized based on past data.

[0501] The selected proposals are sent to the terminal and communicated to the user using an audio output device. Text-to-speech software such as "Amazon Polly" is used for the audio output, providing clear and easy-to-understand guidance to the user.

[0502] Furthermore, the device uses a camera and microphone to recognize the user's voice and facial expressions. Emotion recognition incorporates emotion analysis services such as the "Microsoft Azure Emotion API." This allows the server to generate cleaning suggestions that take the user's emotional state into account. For example, if the system detects that the user is tired, adjustments such as simplifying the cleaning method will be made.

[0503] After cleaning is complete, the user sends feedback on the suggested cleaning method to the server via their device. This feedback information is analyzed using the server's learning mechanisms and used to improve the accuracy of future suggestions.

[0504] A specific example of a prompt: "Analyze the video footage of the room captured by the camera, identify the level of dirt, and suggest the optimal cleaning method based on the user's current mood."

[0505] This format allows users to receive flexible cleaning suggestions tailored to their individual environment and emotional state, thereby improving the efficiency of cleaning work and user satisfaction.

[0506] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0507] Step 1:

[0508] The device acquires images of the room using a camera and transmits that image data to a server via a wireless network. The input is the image data captured by the camera, and the output is the transmission of that image data to the server via the network. Specifically, the device is configured to capture images of the room at regular intervals.

[0509] Step 2:

[0510] The server analyzes the received image data. It receives image data as input, uses "OpenCV" to detect contamination within the image, and identifies its type and location. The output is information about the type and location of the contamination. Specifically, it converts the image to grayscale and performs edge detection and thresholding to highlight the contaminated areas.

[0511] Step 3:

[0512] The server generates suggestions based on the analysis results. The input is analyzed contamination information, and it uses machine learning models such as "Scikit-learn" to select the optimal cleaning method and tools. The output is suggested information regarding cleaning methods and tools. The specific operation includes determining the optimal cleaning method under specific conditions while referencing historical data in the model.

[0513] Step 4:

[0514] The server sends the selected suggestions to the terminal. The input is suggested information about cleaning methods and tools, and the output is this information being sent to the terminal. Specifically, the data is pushed to the terminal via a message queue or API.

[0515] Step 5:

[0516] The terminal communicates suggestions to the user using an audio output device. The input is suggestion information received from the server, and the output is a notification to the user as an audio message. Specifically, it uses speech synthesis software such as "Amazon Polly" to convert the suggestion content into speech and play it back.

[0517] Step 6:

[0518] The device inputs the user's voice and facial expressions into the emotion recognition system. The input consists of user voice and facial expression data acquired by the camera and microphone, and the output is sending that data to an emotion analysis tool. Specifically, it uses the "Microsoft Azure Emotion API" to analyze the user's current emotions.

[0519] Step 7:

[0520] The server adjusts its suggestions based on the sentiment analysis results. The input is the user's sentiment information, and the output is an adjusted cleaning suggestion that takes that information into account. The specific operation includes processes that dynamically adjust the steps and load of the suggestions based on the sentiment information.

[0521] Step 8:

[0522] The user performs the suggested cleaning and provides feedback to the server via a terminal. The input is the user's feedback on the results of the cleaning, and the output is the transmission of that information to the server. For example, the user fills out a feedback form displayed on the terminal and presses the submit button.

[0523] Step 9:

[0524] The server analyzes user feedback information and uses it to improve the accuracy of its suggestions. The input is the feedback information, and the output is the improved suggestion model. The specific operation here is to add the feedback to the dataset and retrain the machine learning model to improve the suggestion accuracy.

[0525] (Application Example 2)

[0526] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0527] In daily life, cleaning tasks require considerable time and effort, yet they present challenges in providing flexible solutions tailored to individual circumstances and emotions. In particular, there is a lack of systems that effectively address room dirt in a way that suits the user's needs. Furthermore, there is a need for technology that provides automated cleaning suggestions that take the user's emotional state into consideration.

[0528] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0529] In this invention, the server includes a device equipped with the function of acquiring video, means for analyzing the video information received from the device and identifying the degree of soiling, and a function for suggesting an appropriate cleaning method and cleaning tools to be used based on the identified degree of soiling. This makes it possible to clean the room efficiently and effectively while providing flexible cleaning suggestions tailored to the user's emotional state.

[0530] A "device equipped with the function of acquiring images" is a device that uses cameras and sensors to collect photographic and video data of a room or space.

[0531] "Methods for identifying the state of dirt" refers to technologies that analyze acquired video footage to determine the type and location of dirt within a room or space.

[0532] The "function that suggests appropriate cleaning methods and cleaning tools" refers to a technology that selects the optimal cleaning procedure and suitable cleaning tools for a given type of soiling and proposes them to the user.

[0533] A "mechanism for communicating proposed cleaning methods via voice output" refers to a device or technology that provides the proposed cleaning procedure to the user as a voice message, and provides instructions and guidance.

[0534] An "engine for recognizing user emotional information" is a technology that analyzes a user's voice tone and visual changes in their facial expressions to grasp their emotional state in real time.

[0535] The "feature that adjusts cleaning suggestions based on emotional information" is a technology that modifies or optimizes suggested cleaning methods and procedures by taking into account the user's current emotional state.

[0536] This system aims to streamline household cleaning and provide a customized experience tailored to the user's emotions. Implementation requires a device to acquire video, typically a consumer robot equipped with a camera. The robot captures real-time images of the room, collecting video data. A server then receives this video data and analyzes it via wireless communication.

[0537] The analysis utilizes the OpenCV video processing library and a learning algorithm that leverages a generative AI model. This allows the server to accurately identify the location and type of dirt from the video. Next, the AI model's suggestion function works to determine the most effective cleaning procedure and the cleaning tools to use. The determined suggestions are communicated to the user via voice through digital voice assistant technology.

[0538] Furthermore, the device incorporates an emotion engine that analyzes the user's voice tone and facial expressions to recognize their emotional state in real time. This emotional information is sent to a server, which adjusts cleaning suggestions to suit the user's situation. For example, if the server determines that the user is tired, it simplifies the suggestions to reduce the user's burden.

[0539] Users can provide feedback after cleaning, and this information is sent to the server. This feedback is analyzed by an algorithm and used to improve future cleaning advice. In this way, the system adapts to the user's current mood and state, pursuing not only cleaning efficiency but also comfort.

[0540] Specific example:

[0541] For example, a consumer robot installed in a living room uses its camera to photograph the floor and detect dust accumulated under the sofa. The server then gives the robot a voice command saying, "Use the vacuum cleaner to suck up the dust under the sofa." If the user is relaxed, after cleaning, it might suggest, "Shall we do a quick clean around the windows next?"

[0542] Examples of prompts for a generative AI model:

[0543] "The smart cleaning system acquires and analyzes video data of the room and suggests cleaning methods tailored to the user's situation. In particular, please provide specific methods for incorporating emotional information into the suggested cleaning methods."

[0544] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0545] Step 1:

[0546] The device uses a camera to capture video footage of the room. It receives real-time video data as input. This video data is transmitted to the server via Wi-Fi. The output is the video data stored on the server.

[0547] Step 2:

[0548] The server extracts specific frames from the received video data and uses video processing libraries such as OpenCV to identify the location and type of dirt for each frame. The input is the transmitted video data, and the output is data on the location and type of the identified dirt.

[0549] Step 3:

[0550] The server uses a generative AI model to calculate the optimal cleaning procedure based on identified dirt and suggests which cleaning tools are most suitable. The input is data on the location and type of dirt, and the output is a suggestion for the cleaning method and tools.

[0551] Step 4:

[0552] The device receives suggestions sent from the server and communicates them to the user as voice output using its digital voice assistant function. The input is the suggestion content, and the output is the voice message heard by the user.

[0553] Step 5:

[0554] The device uses a microphone and camera to collect the user's voice and facial expressions, and sends this data to an emotion engine to analyze the user's emotional state. The input is real-time audio and video data, and the output is data on the identified emotional state.

[0555] Step 6:

[0556] The server receives data on the user's emotional state and adjusts the suggested cleaning methods accordingly. If it determines that the user is tired, it will offer suggestions tailored to the user's condition, such as simplifying the cleaning process. The input is the identified emotional state, and the output is the adjusted suggestions.

[0557] Step 7:

[0558] After cleaning, the user enters feedback via a terminal, which is then sent to the server. The server analyzes the user's feedback using a learning algorithm and uses it as data to improve the accuracy of future suggestions. The input is the user's feedback information, and the output is the improved suggestion algorithm.

[0559] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0560] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0561] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0562] [Fourth Embodiment]

[0563] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0564] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0565] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0566] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0567] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0568] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0569] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0570] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0571] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0572] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0573] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0574] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0575] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0576] The autonomous cleaning assistant system of the present invention includes a video acquisition means, a dirt recognition means, a suggestion means, an output means, and a learning means. First, the terminal acquires video through a camera installed in the room. This transmits image information of dirt in the room to the server in real time. The server analyzes the video using the dirt recognition means to identify the type and location of the dirt.

[0577] Based on the recognized stain, the server's suggestion system consults a database to determine the most appropriate cleaning method and tools to use. This suggestion incorporates insights gained from previously collected user feedback, using a learning algorithm. The terminal communicates this suggestion to the user via voice output. For example, the terminal might advise, "There is a coffee stain on the floor. Use a cloth to wipe it with water first, then use a specialized cleaner."

[0578] After users try the suggested cleaning methods, they can send feedback to the server via their device. This feedback is accumulated by the server's learning mechanisms and helps improve the accuracy of future suggestions. This system is an intelligent system that autonomously optimizes cleaning through dirt recognition, cleaning activity suggestions, and learning. This will revolutionize household cleaning, enabling efficient and comfortable cleaning tasks.

[0579] The following describes the processing flow.

[0580] Step 1:

[0581] The device activates the room's camera and captures video footage of the room in real time. The captured video data is then sent directly to the server.

[0582] Step 2:

[0583] The server analyzes the received video data. Using a dirt recognition method, it extracts features from the video and performs image analysis to identify the location and type of dirt.

[0584] Step 3:

[0585] The server, based on the analysis results, refers to a database and uses a suggestion mechanism to determine the appropriate cleaning method and tools to be used for each type of stain.

[0586] Step 4:

[0587] The server sends the suggested results to the terminal. The terminal uses a voice output device to provide the user with voice instructions for the cleaning procedure.

[0588] Step 5:

[0589] Users perform cleaning tasks according to instructions on their device. Afterwards, they input feedback on the cleaning evaluation and suggestions via the device.

[0590] Step 6:

[0591] The terminal sends the collected feedback to the server. The server incorporates the feedback into its learning mechanism to update the generative model and improve the accuracy of its suggestions.

[0592] (Example 1)

[0593] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0594] Traditional cleaning systems lacked the means to accurately assess the condition of a room and suggest the most appropriate cleaning method. As a result, cleaning was inefficient, and the burden on users was significant. Furthermore, because the appropriate cleaning method for each type of dirt was unknown, there was a possibility of delayed response or the selection of the wrong method. To solve these problems, there is a need for accurate dirt recognition and the suggestion of effective cleaning methods tailored to the situation.

[0595] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0596] In this invention, the server includes a device for acquiring video, an analysis device for analyzing the image information and identifying contamination, and a suggestion device for suggesting appropriate cleaning techniques and tools to be used. This enables accurate identification of contamination and the suggestion of appropriate cleaning methods based on that identification.

[0597] A "device for acquiring video footage" is a device installed in a room to capture the surrounding environment in detail.

[0598] An "analysis device that analyzes image information to identify contamination" is a computer system that identifies dirt and abnormalities based on received video data.

[0599] A "proposal device that suggests appropriate cleaning techniques and tools to be used" is a mechanism that determines and presents effective cleaning methods and necessary equipment based on information from an analysis device.

[0600] A "learning algorithm, including a training model," is an artificial intelligence technology that uses past data and feedback to improve accuracy.

[0601] A "training device for collecting evaluation information" is a system that collects user feedback based on cleaning results to improve identification capabilities and suggestion accuracy.

[0602] This autonomous cleaning assistant system supports effective cleaning without user intervention. The system mainly consists of a terminal that acquires video, a server that analyzes the video, and a device that suggests cleaning methods and receives feedback.

[0603] First, the camera installed in the device acquires high-resolution video to capture the current situation inside the room. This camera can comprehensively capture the entire room by, for example, using a wide-angle lens. This video data is transmitted to the server in real time.

[0604] Next, the server analyzes the received video data using an enhanced AI algorithm. This AI processing utilizes generative AI models and deep learning to identify dirt and obstacles within the video. During this analysis, the server uses a high-performance processor to rapidly process large amounts of data.

[0605] When dirt is detected, the server accesses a database to determine the most suitable cleaning method and tools for the identified dirt. This process utilizes machine learning algorithms based on historical data and user feedback.

[0606] Ultimately, the device will offer the user specific cleaning instructions via voice or text. For example, "For coffee stains on the floor, wipe them with a damp cloth, then use a specialized cleaner." This allows the user to follow the instructions and clean efficiently.

[0607] An example of a prompt message might be: "Analyze the video footage of the room captured by the camera and identify the dirt. Based on the identified dirt, suggest the most suitable cleaning method and tools." This ensures that the system reliably executes the process and provides the user with the best possible cleaning solution.

[0608] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0609] Step 1:

[0610] The device uses a camera installed in the room to acquire high-resolution video. This video shows the floor, furniture, and other environmental elements, providing a detailed capture of the overall room situation. The input is video data showing the current state of the room. Specific operations include controlling the camera's on / off state and adjusting the video's frame rate.

[0611] Step 2:

[0612] The terminal transmits the acquired video data to the server in real time via the network. During this transmission process, the data is compressed to ensure efficient delivery to the server. The compressed video data is then provided to the server as output.

[0613] Step 3:

[0614] The server analyzes the received video data. Here, generative AI models and deep learning techniques are used to identify dirt and anomalies within the video. Compressed video data is passed to the server as input. Specifically, image processing algorithms perform feature extraction to detect dirt in the data and output it.

[0615] Step 4:

[0616] Based on the identified dirt, the server determines the optimal cleaning method and tools to use from the database. At this stage, a learning algorithm based on historical data and user feedback is referenced. Input includes the type and location of the dirt. Output generates specific cleaning instructions.

[0617] Step 5:

[0618] The device communicates the proposed cleaning method to the user. This communication utilizes speech synthesis and text display technologies to explain the cleaning method step-by-step. This allows the user to specifically understand which tools to use and how to use them. The output provides the user with instructions in either voice or text format.

[0619] (Application Example 1)

[0620] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0621] In modern living environments, quickly and efficiently recognizing dirt and suggesting the most suitable cleaning method to the user is a crucial challenge. However, conventional methods often fail to accurately identify the type and location of dirt, resulting in the selection of inappropriate cleaning methods. A system is needed to solve this problem, reduce cleaning effort, and maintain cleanliness in the living environment.

[0622] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0623] In this invention, the server includes a video acquisition means, a feature recognition means, a suggestion means, and a communication means. This makes it possible to accurately recognize dirt and suggest the optimal cleaning method to the user in real time.

[0624] "Image acquisition means" refers to devices or systems that use cameras or sensors to acquire images of the surrounding environment.

[0625] "Feature recognition means" refers to algorithms or devices that analyze acquired video information to identify specific objects or states.

[0626] A "proposal method" refers to a function or system that presents the user with the optimal action or option based on recognized characteristics.

[0627] "Communication means" refers to networks and devices used to transmit data to other devices or terminals.

[0628] A "generative model" is an algorithm or machine learning model that learns from data and generates new information or results.

[0629] A "learning algorithm" is a series of procedures and calculation methods used to learn patterns from data and make judgments or predictions.

[0630] A "learning mechanism" is a system that uses collected data and feedback to improve the performance and accuracy of the system.

[0631] In embodiments of the present invention, the terminal is equipped with a camera as a video acquisition means to acquire video of the environment in real time. The server receives this video information and uses a feature recognition means to analyze the type and location of the dirt. A generative AI model and a learning algorithm are used for feature recognition, and these determine the characteristics of the dirt.

[0632] Based on the analysis results, the server selects the optimal cleaning method and tools to use via a suggestion mechanism. The suggested content is transmitted to the terminal via a communication mechanism, and the user is notified via voice or text message. During this process, user feedback is collected, and the learning mechanism improves the accuracy of suggestions for subsequent uses.

[0633] For example, if a pet leaves hair on the carpet, the system quickly recognizes the mess with its camera and sends a suggestion to the user such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up." In this scenario, the user solves the problem by following the instructions received.

[0634] An example of a prompt would be, "Please tell me how to identify pet hair scattered on the living room carpet, and also provide suggestions for cleaning it."

[0635] This allows users to clean efficiently and easily maintain a clean living environment.

[0636] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0637] Step 1:

[0638] The device uses a camera to capture the room environment in real time and acquire video data. This video data represents the overall view of the room or the state of a specific area. The acquired video data is then sent to the server.

[0639] Step 2:

[0640] The server analyzes the received video data and uses feature recognition to identify dirt in the video. Data processing here includes processes to remove noise and enhance relevant visual information. Specifically, image processing algorithms are used to determine the location and type of dirt and generate recognition results.

[0641] Step 3:

[0642] Based on the recognized dirt information, the server selects the optimal cleaning method using suggestion tools. This process involves referencing a pre-configured database. The database contains information on cleaning methods and tools to be used, categorized by the type of dirt. The server generates suggestions based on this information.

[0643] Step 4:

[0644] The server sends the generated suggestions to the terminal via a communication method. The terminal notifies the user of the suggestions as voice or text message. For example, it might display specific instructions such as, "There is pet hair scattered on the living room carpet. Please use a vacuum cleaner to thoroughly clean it up."

[0645] Step 5:

[0646] The user receives a notification from their device and cleans the dirt using the suggested method. After cleaning, the user sends the results and feedback to the server via their device. This feedback is used by the server to improve the learning accuracy of the suggested method.

[0647] Throughout the process, the server utilizes generative AI models and learning algorithms to continuously improve the accuracy and usability of the system's suggestions based on feedback.

[0648] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0649] The system of the present invention begins by acquiring video footage of a room through a video acquisition means and transmitting it to a server. The server is equipped with a dirt recognition means that analyzes the video data to identify the type and location of dirt in the room. Furthermore, it selects the optimal cleaning method and tools to be used for the dirt identified by the proposed means, transmits this to a terminal, and communicates it to the user via an output means.

[0650] The present invention further incorporates an emotion engine. The terminal sends the user's voice and facial expression data to the emotion engine, which recognizes emotions in real time. Based on this emotion information, the server adjusts the cleaning advice generated by the suggestion means. For example, if the server detects that the user is tired, it recommends a simplified cleaning method or a relaxing task. The terminal guides the user through voice output in a gentle tone, saying something like, "Let's try an easy cleaning method today. First, let's just use the vacuum cleaner to suck up the dust."

[0651] Furthermore, users can provide feedback to the server via their device after performing the suggested cleaning method. This feedback is analyzed by the server's learning mechanisms and reflected in future suggestions. Combined with learning by an emotion engine, the accuracy of the suggestions is improved. This system provides a customized cleaning experience according to the user's current mood and emotional state, achieving both efficiency and comfort in the workplace.

[0652] The following describes the processing flow.

[0653] Step 1:

[0654] The device uses cameras installed in the room to acquire video in real time and transmits that video data to the server.

[0655] Step 2:

[0656] The server analyzes the received video data and uses a dirt recognition system to identify the location and type of dirt.

[0657] Step 3:

[0658] Based on the results of the dirt identification, the server uses suggested methods to determine the optimal cleaning method and tools to be used.

[0659] Step 4:

[0660] The device acquires the user's voice and facial expressions through voice and camera input and sends this data to the emotion engine. The emotion engine analyzes the user's emotions and identifies, for example, levels of fatigue or stress.

[0661] Step 5:

[0662] The server considers the results of the emotion engine and adjusts the decision on suggested actions. For example, if the server detects that the user is tired, it will simplify cleaning methods and suggest less burdensome activities.

[0663] Step 6:

[0664] The device communicates tailored suggestions to the user via voice output. For example, it might say, "Let's keep it simple today. Vacuum up the dust, and if you have time, give it a light wipe."

[0665] Step 7:

[0666] Users carry out suggested cleaning activities and provide feedback on the results via their devices.

[0667] Step 8:

[0668] The device sends user feedback to the server. The server incorporates the feedback and sentiment analysis results into its learning mechanism to improve the accuracy of suggestions and customization capabilities.

[0669] (Example 2)

[0670] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0671] In modern home and office environments, efficient and effective cleaning is crucial, but many people struggle with selecting the right cleaning methods and tools. Furthermore, providing uniform advice without considering the user's emotional state can compromise user satisfaction and effective cleaning results. Additionally, there's a problem with standardized cleaning methods lacking flexibility to adapt to individual situations.

[0672] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0673] In this invention, the server includes an image acquisition means, a recognition means for recognizing contamination, and an adjustment means for adjusting suggestions. This allows for flexible provision of optimal cleaning suggestions while considering the user's feelings, and enables efficient cleaning tailored to individual environments.

[0674] "Image acquisition means" refers to a function or device that captures image data in the target environment and transmits it to a server as input information for analysis.

[0675] "Recognition means" refers to an analytical function that detects contamination within acquired image data and identifies its type and location.

[0676] The "proposal means" is a function that selects the optimal cleaning method and tools to use based on the contamination identified by the recognition means and proposes them to the user.

[0677] "Output means" refers to a function that transmits information about cleaning methods and tools generated by the proposed means to the user in voice or other forms.

[0678] "Emotion recognition means" refers to a function that analyzes the user's emotional state in real time from their voice and facial expressions.

[0679] The "adjustment means" is a function that appropriately adjusts the advice generated by the suggestion means based on the user's emotional information obtained from the emotion recognition means, and provides cleaning suggestions that are appropriate to the user's condition.

[0680] The embodiments for carrying out the present invention are shown below.

[0681] This system is built on a client-server architecture. The terminal is, for example, a portable information terminal with a camera, which acquires images of the room. A standard digital camera can be used. The terminal with the image acquisition means transmits the captured image data to the server via a wireless network.

[0682] The server analyzes the received image data. Image processing libraries such as "OpenCV" are used for image analysis. Specifically, features are extracted from the image to identify the type and location of contamination. Based on the analysis results, the server uses machine learning libraries such as "Scikit-learn" and trained models to select the optimal cleaning method and tools. The cleaning method is optimized based on past data.

[0683] The selected proposals are sent to the terminal and communicated to the user using an audio output device. Text-to-speech software such as "Amazon Polly" is used for the audio output, providing clear and easy-to-understand guidance to the user.

[0684] Furthermore, the device uses a camera and microphone to recognize the user's voice and facial expressions. Emotion recognition incorporates emotion analysis services such as the "Microsoft Azure Emotion API." This allows the server to generate cleaning suggestions that take the user's emotional state into account. For example, if the system detects that the user is tired, adjustments such as simplifying the cleaning method will be made.

[0685] After cleaning is complete, the user sends feedback on the suggested cleaning method to the server via their device. This feedback information is analyzed using the server's learning mechanisms and used to improve the accuracy of future suggestions.

[0686] A specific example of a prompt: "Analyze the video footage of the room captured by the camera, identify the level of dirt, and suggest the optimal cleaning method based on the user's current mood."

[0687] This format allows users to receive flexible cleaning suggestions tailored to their individual environment and emotional state, thereby improving the efficiency of cleaning work and user satisfaction.

[0688] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0689] Step 1:

[0690] The device acquires images of the room using a camera and transmits that image data to a server via a wireless network. The input is the image data captured by the camera, and the output is the transmission of that image data to the server via the network. Specifically, the device is configured to capture images of the room at regular intervals.

[0691] Step 2:

[0692] The server analyzes the received image data. It receives image data as input, uses "OpenCV" to detect contamination within the image, and identifies its type and location. The output is information about the type and location of the contamination. Specifically, it converts the image to grayscale and performs edge detection and thresholding to highlight the contaminated areas.

[0693] Step 3:

[0694] The server generates suggestions based on the analysis results. The input is analyzed contamination information, and it uses machine learning models such as "Scikit-learn" to select the optimal cleaning method and tools. The output is suggested information regarding cleaning methods and tools. The specific operation includes determining the optimal cleaning method under specific conditions while referencing historical data in the model.

[0695] Step 4:

[0696] The server sends the selected suggestions to the terminal. The input is suggested information about cleaning methods and tools, and the output is this information being sent to the terminal. Specifically, the data is pushed to the terminal via a message queue or API.

[0697] Step 5:

[0698] The terminal communicates suggestions to the user using an audio output device. The input is suggestion information received from the server, and the output is a notification to the user as an audio message. Specifically, it uses speech synthesis software such as "Amazon Polly" to convert the suggestion content into speech and play it back.

[0699] Step 6:

[0700] The device inputs the user's voice and facial expressions into the emotion recognition system. The input consists of user voice and facial expression data acquired by the camera and microphone, and the output is sending that data to an emotion analysis tool. Specifically, it uses the "Microsoft Azure Emotion API" to analyze the user's current emotions.

[0701] Step 7:

[0702] The server adjusts its suggestions based on the sentiment analysis results. The input is the user's sentiment information, and the output is an adjusted cleaning suggestion that takes that information into account. The specific operation includes processes that dynamically adjust the steps and load of the suggestions based on the sentiment information.

[0703] Step 8:

[0704] The user performs the suggested cleaning and provides feedback to the server via a terminal. The input is the user's feedback on the results of the cleaning, and the output is the transmission of that information to the server. For example, the user fills out a feedback form displayed on the terminal and presses the submit button.

[0705] Step 9:

[0706] The server analyzes user feedback information and uses it to improve the accuracy of its suggestions. The input is the feedback information, and the output is the improved suggestion model. The specific operation here is to add the feedback to the dataset and retrain the machine learning model to improve the suggestion accuracy.

[0707] (Application Example 2)

[0708] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0709] In daily life, cleaning tasks require considerable time and effort, yet they present challenges in providing flexible solutions tailored to individual circumstances and emotions. In particular, there is a lack of systems that effectively address room dirt in a way that suits the user's needs. Furthermore, there is a need for technology that provides automated cleaning suggestions that take the user's emotional state into consideration.

[0710] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0711] In this invention, the server includes a device equipped with the function of acquiring video, means for analyzing the video information received from the device and identifying the degree of soiling, and a function for suggesting an appropriate cleaning method and cleaning tools to be used based on the identified degree of soiling. This makes it possible to clean the room efficiently and effectively while providing flexible cleaning suggestions tailored to the user's emotional state.

[0712] A "device equipped with the function of acquiring images" is a device that uses cameras and sensors to collect photographic and video data of a room or space.

[0713] "Methods for identifying the state of dirt" refers to technologies that analyze acquired video footage to determine the type and location of dirt within a room or space.

[0714] The "function that suggests appropriate cleaning methods and cleaning tools" refers to a technology that selects the optimal cleaning procedure and suitable cleaning tools for a given type of soiling and proposes them to the user.

[0715] A "mechanism for communicating proposed cleaning methods via voice output" refers to a device or technology that provides the proposed cleaning procedure to the user as a voice message, and provides instructions and guidance.

[0716] An "engine for recognizing user emotional information" is a technology that analyzes a user's voice tone and visual changes in their facial expressions to grasp their emotional state in real time.

[0717] The "feature that adjusts cleaning suggestions based on emotional information" is a technology that modifies or optimizes suggested cleaning methods and procedures by taking into account the user's current emotional state.

[0718] This system aims to streamline household cleaning and provide a customized experience tailored to the user's emotions. Implementation requires a device to acquire video, typically a consumer robot equipped with a camera. The robot captures real-time images of the room, collecting video data. A server then receives this video data and analyzes it via wireless communication.

[0719] The analysis utilizes the OpenCV video processing library and a learning algorithm that leverages a generative AI model. This allows the server to accurately identify the location and type of dirt from the video. Next, the AI model's suggestion function works to determine the most effective cleaning procedure and the cleaning tools to use. The determined suggestions are communicated to the user via voice through digital voice assistant technology.

[0720] Furthermore, the device incorporates an emotion engine that analyzes the user's voice tone and facial expressions to recognize their emotional state in real time. This emotional information is sent to a server, which adjusts cleaning suggestions to suit the user's situation. For example, if the server determines that the user is tired, it simplifies the suggestions to reduce the user's burden.

[0721] Users can provide feedback after cleaning, and this information is sent to the server. This feedback is analyzed by an algorithm and used to improve future cleaning advice. In this way, the system adapts to the user's current mood and state, pursuing not only cleaning efficiency but also comfort.

[0722] Specific example:

[0723] For example, a consumer robot installed in a living room uses its camera to photograph the floor and detect dust accumulated under the sofa. The server then gives the robot a voice command saying, "Use the vacuum cleaner to suck up the dust under the sofa." If the user is relaxed, after cleaning, it might suggest, "Shall we do a quick clean around the windows next?"

[0724] Examples of prompts for a generative AI model:

[0725] "The smart cleaning system acquires and analyzes video data of the room and suggests cleaning methods tailored to the user's situation. In particular, please provide specific methods for incorporating emotional information into the suggested cleaning methods."

[0726] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0727] Step 1:

[0728] The device uses a camera to capture video footage of the room. It receives real-time video data as input. This video data is transmitted to the server via Wi-Fi. The output is the video data stored on the server.

[0729] Step 2:

[0730] The server extracts specific frames from the received video data and uses video processing libraries such as OpenCV to identify the location and type of dirt for each frame. The input is the transmitted video data, and the output is data on the location and type of the identified dirt.

[0731] Step 3:

[0732] The server uses a generative AI model to calculate the optimal cleaning procedure based on identified dirt and suggests which cleaning tools are most suitable. The input is data on the location and type of dirt, and the output is a suggestion for the cleaning method and tools.

[0733] Step 4:

[0734] The device receives suggestions sent from the server and communicates them to the user as voice output using its digital voice assistant function. The input is the suggestion content, and the output is the voice message heard by the user.

[0735] Step 5:

[0736] The device uses a microphone and camera to collect the user's voice and facial expressions, and sends this data to an emotion engine to analyze the user's emotional state. The input is real-time audio and video data, and the output is data on the identified emotional state.

[0737] Step 6:

[0738] The server receives data on the user's emotional state and adjusts the suggested cleaning methods accordingly. If it determines that the user is tired, it will offer suggestions tailored to the user's condition, such as simplifying the cleaning process. The input is the identified emotional state, and the output is the adjusted suggestions.

[0739] Step 7:

[0740] After cleaning, the user enters feedback via a terminal, which is then sent to the server. The server analyzes the user's feedback using a learning algorithm and uses it as data to improve the accuracy of future suggestions. The input is the user's feedback information, and the output is the improved suggestion algorithm.

[0741] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0742] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0743] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0744] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0745] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0746] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0747] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0748] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0749] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0750] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0751] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0752] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0753] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0754] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0755] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0756] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0757] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0758] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0759] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0760] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0761] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0762] The following is further disclosed regarding the embodiments described above.

[0763] (Claim 1)

[0764] Means of acquiring video,

[0765] A dirt recognition means analyzes the video information received from the aforementioned video acquisition means and recognizes dirt,

[0766] Based on the dirt recognized by the dirt recognition means, a suggestion means proposes the optimal cleaning method and cleaning tools to be used.

[0767] An output means for outputting the proposal generated by the aforementioned proposal means as audio,

[0768] A system that includes this.

[0769] (Claim 2)

[0770] The system according to claim 1, characterized in that the dirt recognition means identifies dirt using a learning algorithm that includes a generative model.

[0771] (Claim 3)

[0772] The system according to claim 1, characterized in that the proposed means further includes a learning means for collecting user feedback information and improving the accuracy of the proposal based on the feedback information.

[0773] "Example 1"

[0774] (Claim 1)

[0775] A device for acquiring video,

[0776] An analysis device that analyzes image information received from the aforementioned device and identifies contamination,

[0777] Based on the contamination identified by the aforementioned analysis device, a suggestion device presents appropriate cleaning techniques and tools to be used.

[0778] An output device that outputs the proposal generated by the aforementioned proposal device in both audio and text format,

[0779] A system that includes this.

[0780] (Claim 2)

[0781] The system according to claim 1, characterized in that the analysis device identifies contamination using a learning algorithm that includes a training model.

[0782] (Claim 3)

[0783] The system according to claim 1, characterized in that the proposed device further includes a training device that collects evaluation information from users and improves the accuracy of the proposal based on the evaluation information.

[0784] "Application Example 1"

[0785] (Claim 1)

[0786] Means of acquiring video,

[0787] A feature recognition means analyzes the video information received from the aforementioned video acquisition means and recognizes dirt,

[0788] Based on the stains recognized by the feature recognition means, a suggestion means proposes the optimal treatment method and tools to be used.

[0789] An output means for outputting the proposal generated by the aforementioned proposal means as audio,

[0790] A communication means that transmits information about stains that have been recognized for their features to a data processing terminal and notifies the user via a communication means,

[0791] A system that includes this.

[0792] (Claim 2)

[0793] The system according to claim 1, characterized in that the feature recognition means identifies dirt using a learning algorithm that includes a generative model.

[0794] (Claim 3)

[0795] The system according to claim 1, characterized in that the proposed means further includes a learning mechanism that collects feedback information from users and improves the accuracy of the proposal based on the feedback information.

[0796] "Example 2 of combining an emotion engine"

[0797] (Claim 1)

[0798] Image acquisition method,

[0799] A recognition means analyzes the image information received from the image acquisition means and recognizes contamination,

[0800] Based on the contamination recognized by the recognition means, a suggestion means proposes the optimal cleaning method and tools to be used.

[0801] An output means for outputting the proposal generated by the aforementioned proposal means as audio,

[0802] An emotion recognition means that analyzes the user's voice and facial expressions in real time to recognize emotions,

[0803] An adjustment means for adjusting the proposal based on the analysis results by the emotion recognition means,

[0804] A system that includes this.

[0805] (Claim 2)

[0806] The system according to claim 1, characterized in that the recognition means determines contamination using a learning algorithm that includes a generative model.

[0807] (Claim 3)

[0808] The system according to claim 1, characterized in that the proposed means further includes a learning means for collecting user feedback information and improving the accuracy of the proposal based on the feedback information.

[0809] "Application example 2 when combining with an emotional engine"

[0810] (Claim 1)

[0811] A device equipped with the function of acquiring video,

[0812] A means for analyzing video information received from the aforementioned device and identifying the degree of contamination,

[0813] Based on the identified soiling conditions, the function proposes an appropriate cleaning method and cleaning tools to be used.

[0814] A mechanism that transmits the proposed cleaning method via voice output,

[0815] An engine for recognizing user emotional information,

[0816] A function to adjust cleaning suggestions based on the aforementioned emotional information,

[0817] A system that includes this.

[0818] (Claim 2)

[0819] The system according to claim 1, characterized in that the means for identifying the state of soiling uses a learning algorithm including a generative model to determine the soiling.

[0820] (Claim 3)

[0821] The system according to claim 1, further characterized in that the aforementioned suggestion function includes a learning mechanism that collects user feedback data and improves the accuracy of the suggestions based on that feedback data. [Explanation of symbols]

[0822] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means of acquiring video, A dirt recognition means analyzes the video information received from the aforementioned video acquisition means and recognizes dirt, Based on the dirt recognized by the dirt recognition means, a suggestion means proposes the optimal cleaning method and cleaning tools to be used. An output means for outputting the proposal generated by the aforementioned proposal means as audio, A system that includes this.

2. The system according to claim 1, characterized in that the dirt recognition means identifies dirt using a learning algorithm that includes a generative model.

3. The system according to claim 1, characterized in that the proposed means further includes a learning means for collecting user feedback information and improving the accuracy of the proposal based on the feedback information.