system

JP2026105399APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-16
Publication Date: 2026-06-26

Application Information

Patent Timeline

16 Dec 2024

Application

26 Jun 2026

Publication

JP2026105399A

IPC: G06Q50/10

AI Tagging

Technology Topics

Speech ProcessorEngineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Vehicle and method for controlling thereof
KR102993074B1Speech ProcessorEngineering
Voice interaction device, system, method, cloud server and medium
CN122179247ASubstation remote connection/disconnection Speech recognitionSpeech ProcessorInteraction device
DEVICE FOR LANGUAGE PROCESSING AND METHOD FOR IT
DE102025120167A1Speech recognitionSpeech ProcessorAcoustics
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories Engineering Sediment
An IGBT lifetime prediction method based on a GA-Elman-LSTM combined model
CN115964937BImprove forecast accuracySolve the problem of easy to fall into local minimumInternal combustion piston engines Biological models Engineering Data mining

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Selecting appropriate cleaning methods and tools based on the type and degree of dirt is difficult, complicating cleaning work and reducing quality of life, especially for families with limited time and users without professional knowledge.

Method used

A system utilizing a generative artificial intelligence model to analyze images of dirty areas, determine the appropriate cleaning methods and tools, and provide guidance through a speech synthesizer, with an interface for user interaction and progress reporting.

Benefits of technology

Enables efficient and user-friendly cleaning by identifying dirt types and degrees, suggesting optimal methods and tools, and allowing real-time progress reporting, thereby simplifying cleaning operations and improving user satisfaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026105399000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] It is a device equipped with an imaging means, and a means for recording images of a dirty area, A means for processing the aforementioned video data and using a machine learning model to identify the type and degree of dirt, A means for determining a method for suggesting the optimal cleaning means and tools based on the identified dirt, A means comprising a voice processor that converts the aforementioned presentation into audio and provides guidance to the user, A means equipped in cleaning equipment that has a patrol function that detects dirt by going around to different locations in the home, A communication means that sends a notification to a smartphone based on the information detected by the aforementioned patrol function, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Regarding house cleaning, there is a problem that it is difficult to select an appropriate cleaning method and tool according to the type and degree of dirt. This problem complicates the cleaning work and reduces the quality of life, especially for families with limited time and users without professional knowledge. In addition, due to the lack of specific instructions for improving the cleaning efficiency, the cleaning work often becomes more laborious.

Means for Solving the Problems

[0005] This invention provides a system that uses a generative artificial intelligence model to identify the type and degree of dirt by analyzing images obtained from a device equipped with image acquisition means to capture images of a dirty area. This allows for the determination of a method to suggest appropriate cleaning methods and tools based on the identified dirt. Furthermore, by providing a speech synthesizer that converts the suggestions into speech and provides guidance to the user, user-friendly interaction is achieved. In addition, by providing an interface to accept voice input from the user and receive confirmation of the completion of the cleaning status, the user can easily report the progress of the work. This enables the presentation of appropriate and efficient cleaning solutions.

[0006] "Image acquisition means" refers to devices or technologies that capture images of areas containing dirt or stains.

[0007] "Video data" refers to digital image information of a dirty area.

[0008] A "generative artificial intelligence model" is an AI technology that analyzes video data to identify the type and degree of dirt.

[0009] A "cleaning method" refers to the procedures and techniques for performing the most appropriate treatment on identified stains.

[0010] "Cleaning tools" refer to various tools and equipment used to carry out cleaning work.

[0011] A "speech synthesizer" is a device or technology used to convert text information into speech format.

[0012] An "interface" is a means of receiving input from a user and facilitating communication between the system and the user. [Brief explanation of the drawing]

[0013] [Figure 1]This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0014] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0019] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] This invention provides an autonomous system that assists users in performing cleaning tasks efficiently and effectively in the cleaning of a house. The system is configured as follows:

[0035] First, the device is equipped with a camera that can capture video of a dirty room in real time. This image acquisition means obtains video data to determine the areas that need cleaning. Next, the device sends the captured video data to a server. The server then uses a generative artificial intelligence model to analyze the video data. This analysis identifies the type and extent of the dirt and determines the appropriate cleaning method and cleaning tools to use. This information is then presented to the user as recommended cleaning methods and tools.

[0036] The server transmits the determined cleaning method to the terminal, which then uses a speech synthesizer to play the received information as audio, providing guidance to the user. This allows the user to efficiently perform the cleaning according to the suggested procedure.

[0037] For example, when the device captures video of the kitchen, the server analyzes the information to identify grease stains and, based on that information, recommends "using dish soap and a sponge, and rinsing with warm water." The device communicates this verbally, and the user can follow the instructions and begin cleaning.

[0038] Furthermore, this system includes an interface that accepts voice input from the user, allowing for reports on cleaning progress and completion status. This makes it easy for users to express their opinions regarding cleaning and receive instructions at the appropriate time. This interactive function enables more efficient cleaning work and simplifies operation.

[0039] The following describes the processing flow.

[0040] Step 1:

[0041] The device activates its camera and captures video of the designated room or area in real time. The acquired video data is stored in temporary storage.

[0042] Step 2:

[0043] The terminal compresses and converts the format of the video data stored in temporary storage in order to send it to the server, and prepares it for transmission over the network.

[0044] Step 3:

[0045] The server runs a generative artificial intelligence model to analyze the received video data. This model identifies the type of dirt and determines its degree and extent.

[0046] Step 4:

[0047] Based on the dirt analysis results, the server selects the optimal cleaning method and tools from its internal database. It then organizes this selection information and prepares it for transmission to the terminal.

[0048] Step 5:

[0049] The terminal uses a speech synthesizer to convert text information into audio data in order to convert the content of the suggestions received from the server into audio format.

[0050] Step 6:

[0051] The device's voice assistant plays the converted audio data, guiding the user through specific cleaning procedures and the tools they should use.

[0052] Step 7:

[0053] The user follows the instructions of the voice assistant and actually performs the cleaning using the appropriate cleaning method. They report their progress and completion to the assistant via voice input.

[0054] Step 8:

[0055] The terminal recognizes voice input from the user, sends the content to the server to record the completion of the cleaning task, and prepares to provide the next instructions.

[0056] (Example 1)

[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0058] Cleaning at home can be inefficient because it's difficult to determine which areas to clean and how. Furthermore, selecting the appropriate cleaning method and tools based on the type of dirt requires experience and knowledge, which can be a burden on the user.

[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0060] In this invention, the server includes means for acquiring video footage of a contaminated area using a device equipped with an image capture device, means for analyzing the video information using a generative artificial intelligence model to identify the type and degree of contamination, and means for receiving progress reports from the user via a voice input interface. This enables the user to efficiently identify areas to be cleaned and clean them in the most optimal way.

[0061] An "image capture device" is a part of the equipment used to acquire images of a contaminated area and has the function of capturing visual information in real time.

[0062] A "generative artificial intelligence model" is an AI technology used to analyze video information and identify the type and degree of dirt, and includes algorithms that apply deep learning and machine learning.

[0063] A "voice converter" is a device or function that converts proposed cleaning methods into voice and provides guidance to the user.

[0064] A "voice input interface" is a means for users to communicate progress and completion reports to the system by voice, and includes voice acquisition devices such as microphones.

[0065] "Cleaning means" refers to methods or techniques for performing appropriate cleaning actions on identified dirt.

[0066] This invention is a system designed to assist with household cleaning tasks. The system's implementation is achieved through three main components: a terminal, a server, and a user.

[0067] First, the device uses a camera-equipped image capture device to acquire video footage of a specific area within the home. This footage is captured in real time and targets areas suspected of contamination. The device combines a high-resolution camera with data compression technology to quickly process the video and transfer it to a server.

[0068] Next, the server analyzes the video data transmitted from the terminal using a generative artificial intelligence model. The AI model used here is based on deep learning and has the ability to identify various types of dirt (e.g., oil stains, dust, water stains, etc.) that appear in the video and to evaluate their degree. After identification, the server identifies the optimal cleaning method and the equipment required for cleaning, and then forms the procedure.

[0069] The cleaning methods and procedures determined by the server are sent back to the terminal. The terminal uses a voice converter to convert this information into speech and provides the user with voice guidance on how to proceed with the cleaning. For example, for grease stains in the kitchen, instructions such as "It is recommended to use dish soap and a sponge and rinse with warm water" are provided.

[0070] Furthermore, users can report the progress and completion status of their cleaning to the server through a voice input interface. This interaction allows the server to provide the next necessary instructions at the appropriate time. In this way, users can carry out their cleaning tasks more efficiently, achieving improved overall system efficiency.

[0071] An example of a prompt would be a question like, "Please suggest the best cleaning method for the grease stains on the kitchen floor," which is then asked to the generating AI model. This allows the user to receive cleaning instructions in a more specific and appropriate manner.

[0072] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0073] Step 1:

[0074] The device uses a camera to capture video of the entire room. In this step, video data is generated in real time. All visual information within the camera's field of view is used as input, and high-resolution video data is generated as output. This video data provides wide-ranging visual information, including areas that need cleaning.

[0075] Step 2:

[0076] The terminal compresses the acquired video data and sends it to the server via the internet. This step improves the transmission speed by reducing the size of the video data. Raw video data is required as input, and compressed video data is the output.

[0077] Step 3:

[0078] The server decompresses the received compressed video data and inputs it into a generative artificial intelligence model. In this step, the server analyzes the video data to identify the type and location of dirt in the room. Based on the input data, the AI model analyzes the data using a deep learning algorithm and outputs the type and extent of the dirt, as well as the appropriate cleaning method.

[0079] Step 4:

[0080] Based on the analysis results, the server determines specific cleaning methods and tools and generates this information in the form of prompt statements. In this step, the server scrutinizes the information based on the identified dirt and constructs a user-friendly suggestion statement. The input is output information from the AI model, and the output is a prompt statement explaining the recommended cleaning method.

[0081] Step 5:

[0082] The server sends the generated prompt message to the terminal. The input is a prompt message describing the recommended cleaning method, and the output is the data transmitted to the terminal.

[0083] Step 6:

[0084] The device converts the received recommendation information into speech and provides guidance to the user. In this step, speech synthesis technology is used to convert text data into speech data. The input is a prompt, and the output is a voice guide for the user.

[0085] Step 7:

[0086] The user reports the progress and results of cleaning using a voice input interface. In this step, the voice input from the user is converted into text data and sent to the server. The input is the user's voice, and the output is the basis for determining the next instruction.

[0087] Step 8:

[0088] Based on the user's progress report, the server updates the next cleaning procedure, generates new instructions as needed, and sends them to the terminal. This step involves analyzing the user's progress information and generating a new prompt message. The input is progress information, and the output is a prompt message containing the next recommended procedure.

[0089] (Application Example 1)

[0090] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0091] In many modern homes, routine cleaning is a significant burden. Manual cleaning, in particular, is time-consuming and labor-intensive, and identifying the right dirt and selecting the appropriate cleaning method can be difficult. Furthermore, instantly determining the optimal cleaning method for different types of dirt is challenging. There is also a need to monitor the progress and completion of the cleaning process in real time. To address these issues, there is a demand for more efficient and effective autonomous cleaning systems.

[0092] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0093] In this invention, the server is a device equipped with imaging means, which includes means for recording images of a dirty area, means for processing the image data and using a machine learning model to identify the type and degree of dirt, means for determining a method for suggesting the optimal cleaning means and tools based on the identified dirt, means for a voice processor that converts the suggestion into voice and provides guidance to the user, means mounted on a cleaning device and equipped with a patrol function that travels to different locations in the home to detect dirt, and communication means that sends a notification to a smartphone based on the information detected by the patrol function. This enables autonomous and efficient cleaning of the home and allows for real-time cleaning guidance and status reports to the user.

[0094] "Imaging means" refers to a device or method for recording images or videos of an object.

[0095] An "device" is an electronic or mechanical device designed to perform a specific function.

[0096] "Processing data" means analyzing, transforming, or interpreting collected information to make it a useful format.

[0097] A "machine learning model" is a system based on an algorithm that learns from data and recognizes patterns.

[0098] "Cleaning means" refers to a method or technique used to effectively remove a specific type of dirt.

[0099] "Tools" refer to the tools and equipment used to perform cleaning tasks.

[0100] A "speech processor" is a device or program for generating, converting, or outputting speech data.

[0101] "Patrol function" refers to the ability of cleaning equipment to perform specific tasks while moving along a set route or area.

[0102] "Communication methods" refer to channels and protocols used to send and receive information.

[0103] A "notification" is a message or alert used to convey specific information to a user.

[0104] To implement this invention, the following system is constructed. A server receives video data transmitted from a terminal equipped with a high-precision camera and analyzes it using a generative AI model. Through analysis on the server, the type and extent of dirt in the video are identified, and the optimal cleaning method and tools are determined. This information is converted into audio data by an audio processor and communicated to the user via a smart speaker or smartphone.

[0105] The device, acting as a household robot vacuum cleaner or similar device, patrols the room while capturing video with its camera. The collected video is transmitted to a server in real time for rapid analysis by an AI model. Cleaning instructions are sent to the user's smartphone as needed via notifications, and are also provided directly to the user as voice guidance.

[0106] For example, if the device detects a juice stain under the sofa while patrolling the living room, the server can analyze the information and provide voice guidance such as, "Please wipe the stain using juice-specific detergent and a cloth." In this way, the user can efficiently clean the area by following the suggested procedure.

[0107] A concrete example of a prompt message is, "Analyze the dirt in the captured video and suggest the best cleaning method." This prompt prompts the AI model to identify the appropriate dirt and suggest a cleaning method.

[0108] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0109] Step 1:

[0110] The device patrols the room and acquires video in real time using its camera. The input is video of the actual indoor environment, and the output is that video data. The camera inside the device captures this and processes it immediately as digital data.

[0111] Step 2:

[0112] The terminal transmits the acquired video data to the server. The input is the video data processed within the terminal, and the output is the video data received by the server. The data is transferred to the server quickly and securely through the terminal's communication module.

[0113] Step 3:

[0114] The server analyzes the received video data using a generating AI model. The input is the video data received by the server, and the output is the analysis results regarding the type and degree of dirt. The AI model within the server scans the video and identifies specific dirt features based on its internal algorithm.

[0115] Step 4:

[0116] The server determines cleaning methods and tools based on the analysis results. The input is data on the type and degree of dirt, and the output is information on recommended cleaning methods and tools. The decision algorithm within the server calculates the most effective cleaning method and selects the appropriate tools.

[0117] Step 5:

[0118] The server converts the determined cleaning method into audio data and sends it as a notification to the terminal or smartphone. The input is information about the cleaning method and tools, and the output is a cleaning guide in audio format. Speech synthesis technology converts the information into human language, providing direct guidance to the user.

[0119] Step 6:

[0120] The user performs the necessary cleaning tasks according to the provided audio guide. The input is the received audio guide, and the output is the completed cleaning task. The user performs specific cleaning tasks using the tools specified by the audio guide.

[0121] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0122] This invention provides a system that incorporates an emotion engine that recognizes the user's emotions, in addition to a device equipped with image acquisition means, in order to perform cleaning work efficiently and in a user-friendly manner. Here, the emotion engine analyzes the user's emotions from voice and facial expression data and dynamically adjusts the content and presentation method of the cleaning guide.

[0123] The device first captures the dirty area with its camera and sends the video data to a server. The server uses a generative artificial intelligence model to analyze the image, identify the type and extent of the dirt, and suggest appropriate cleaning methods and tools. In addition, an emotion engine analyzes the user's voice and facial expressions in real time to determine the user's stress level and emotional state. This information is used to adjust the cleaning guide.

[0124] For example, when the device captures the living room and sends the video data to the server, the server analyzes that there is a lot of pet hair on the floor and suggests "using a vacuum cleaner with high suction power." At the same time, if the emotion engine built into the device detects that the user is feeling stressed, it adjusts the tone of the cleaning guide to be gentler and explains the steps in more detail to provide a sense of security.

[0125] Furthermore, the system includes an interface that accepts voice input from the user, forming a feedback loop by allowing them to input cleaning progress and their impressions. This loop leverages past emotional data to personalize future suggestions and guidance. Through these features, the present invention can not only streamline cleaning operations but also improve user psychological satisfaction.

[0126] The following describes the processing flow.

[0127] Step 1:

[0128] The device activates its camera and captures the designated cleaning area. This video data is temporarily stored and prepared to be sent to the server.

[0129] Step 2:

[0130] The terminal sends video data to the server. Data compression and conversion are performed before transmission, enabling efficient communication over the network.

[0131] Step 3:

[0132] The server runs a generative artificial intelligence model to analyze the received video data and identify the type and extent of dirt. Based on these results, it determines the optimal cleaning method and tools.

[0133] Step 4:

[0134] The server sends the determined cleaning method and tools to the terminal. Furthermore, it prepares to receive voice and facial expression data from the terminal in order to analyze the user's emotional data.

[0135] Step 5:

[0136] The device uses an emotion recognition engine to analyze the user's voice and facial expressions in real time. Based on the analysis results, it determines the user's emotional state and stress level.

[0137] Step 6:

[0138] The device's emotion engine adjusts the content and tone of the cleaning guide according to the user's emotional state. For example, if the user is feeling stressed, it will provide a gentler tone and more detailed explanations.

[0139] Step 7:

[0140] The terminal uses a speech synthesizer to play pre-tuned instructions to the user via voice. This allows the user to begin cleaning tasks based on the instructions.

[0141] Step 8:

[0142] Users provide feedback to the system via voice input, reporting progress and comments. This information is recorded on the device and sent to the server.

[0143] Step 9:

[0144] The server analyzes the feedback and stores it as data for future cleaning assistance. This makes it possible to provide more personalized assistance based on emotions.

[0145] (Example 2)

[0146] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0147] This invention aims not only to perform cleaning tasks efficiently but also to improve user psychological satisfaction. Conventional cleaning systems merely detect dirt and suggest cleaning methods, lacking the ability to adjust guidance that takes into account the user's emotions and stress levels. As a result, while the cleaning work itself may become more efficient, there is a problem in that the user experience does not improve.

[0148] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0149] In this invention, the server includes means for using an artificial intelligence algorithm to receive and analyze environmental information and identify abnormal conditions, means for determining a method for presenting countermeasures based on the identified abnormal conditions, and means for evaluating the user's psychological state using an emotion processing device. This makes it possible to dynamically adjust and provide a personalized cleaning guide tailored to the user.

[0150] "Environmental information" refers to data that the device acquires through its imaging equipment, indicating the surrounding environment and conditions.

[0151] "Photography equipment" refers to devices used to acquire still images and videos, such as cameras and sensors.

[0152] An "artificial intelligence algorithm" refers to a calculation method used to analyze acquired data and derive specific results or judgments.

[0153] An "abnormal condition" refers to a state or situation that deviates from the normal range, and includes dirt and disorder that require cleaning.

[0154] "Solution method" refers to a method that provides the optimal solution or countermeasure for an identified abnormal condition.

[0155] "Equipment used" refers to the tools and equipment necessary to implement the countermeasures.

[0156] An "emotion processing device" is a device or program that analyzes a user's voice and facial expression data to determine their psychological state.

[0157] "Information transmission medium" refers to a means of conveying messages, such as audio and video, to users.

[0158] "Synthesized speech output" refers to a technology that converts text information into speech and provides it to the user as an audio guide.

[0159] The "personalized processing function" is a function that adjusts the content according to the user's status and preferences, providing optimized information.

[0160] Embodiments of the present invention will be described in detail below.

[0161] The system related to this invention is designed to make cleaning work efficient and user-friendly. Specifically, it employs a system in which a terminal acquires physical environmental information, and a server is responsible for processing the data based on that information. The entire system is composed of an image acquisition means, an emotion engine, and a generative AI model as its core components.

[0162] The image acquisition device used is a high-resolution camera mounted on the terminal. This camera captures an overall image of the room and objects, including any dirt, and transmits the details as image data to the server. Wireless communication technology is used for data transmission, ensuring secure data transfer.

[0163] The server analyzes the received image data using a generative AI model. Specifically, the AI algorithm identifies the location, type, and extent of dirt from the video. Based on this analysis, the server suggests the optimal cleaning method and cleaning tools. For example, if the processed data determines that there is a lot of pet hair on the floor, it will suggest a specific solution such as "use a vacuum cleaner with high suction power."

[0164] The device is equipped with an emotion engine to capture the user's voice and facial expressions. This emotion engine aims to understand the user's emotional state and stress level, thereby adjusting the system's guidance content and presentation methods in real time. For example, if the user is feeling stressed, the guidance tone will be set to be gentler, and step-by-step explanations will be more detailed.

[0165] For example, imagine a user attempting to clean their living room. The device can capture an image of the entire room, and the server can detect the presence of pet hair. This would then recommend the use of a vacuum cleaner with high suction power. Simultaneously, the device's emotion engine would detect the user's stress level and adjust the tone of the guidance accordingly.

[0166] Examples of prompts that can be input to the generating AI model include "Analyze the cleanliness of the room and suggest cleaning methods" and "Recognize the user's emotions and adjust the guidance accordingly."

[0167] A key feature of this system is that the user's emotional state is incorporated into the feedback loop, allowing subsequent guidance to be more personalized. Thus, this invention not only improves cleaning efficiency but also enhances psychological satisfaction.

[0168] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0169] Step 1:

[0170] The terminal captures video footage of the room via a high-resolution camera. The captured video data is transmitted to the server via wireless communication. The input is video data of the room, and the output is a digital image file sent to the server. In this step, the terminal acts as a video capture device, acquiring detailed data on the area to be cleaned.

[0171] Step 2:

[0172] The server analyzes the received video data using a generating AI model. The input is video data received from the terminal, and the output is the analysis results regarding the location, type, and extent of the dirt. In this step, the server processes the data and applies AI algorithms to identify specific types of dirt. For example, it can identify pet hair on the floor or dirt on windows.

[0173] Step 3:

[0174] The server determines the appropriate cleaning method and cleaning tools to use based on the analysis results. The input is the dirt analysis results, and the output is suggested information for the user. The server performs data calculations and generates specific advice, such as "You should use a vacuum cleaner with high suction power."

[0175] Step 4:

[0176] The device inputs the user's voice and facial expressions into an emotion engine in real time to analyze their emotional state. The input is the user's voice and facial expression data, and the output is the analysis results regarding their psychological state. The device determines the user's stress level and uses this information to adjust the guidance for the next step.

[0177] Step 5:

[0178] The server dynamically adjusts the content and tone of the cleaning guide provided to the user based on the results of an analysis of their emotional state. The input is the results of the emotional state analysis and cleaning suggestions, and the output is the adjusted guide information. In this way, the server customizes the guide to match the user's psychological state.

[0179] Step 6:

[0180] Users input cleaning progress and feedback through the terminal's voice interface. Input is voice data, and output is feedback information within the system. User feedback is used to improve future suggestions and guides.

[0181] These steps enable the system to clean efficiently while also improving user satisfaction by considering their feelings.

[0182] (Application Example 2)

[0183] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0184] Conventional automatic cleaning systems have difficulty detecting dirt and suggesting appropriate cleaning methods, and furthermore, they cannot take into account the user's feelings, resulting in a lack of improvement in user satisfaction.

[0185] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0186] In this invention, the server includes means for using an artificial intelligence model that analyzes image data and identifies specific situations, means incorporating an emotion analysis engine that analyzes the user's voice and facial expressions and evaluates their mental state, and means for a synthesizer that adjusts the guide based on the evaluation and provides it in an individualized format. This makes it possible to suggest the optimal cleaning method according to the situation and to improve the user experience by taking into consideration the user's mental state.

[0187] "Image acquisition means" refers to a device or function that captures image data of a physical surface.

[0188] An "artificial intelligence model" is an algorithm or data-driven computer program used to analyze image data and identify specific situations or depths.

[0189] An "emotion analysis engine" is a function or software that analyzes the user's voice and facial expressions to evaluate their mental state.

[0190] A "speech synthesizer" is a device or function that converts text data into speech to provide guidance and information to the user.

[0191] The "profile adjustment function" is an adjustment function that personalizes suggested methods and content based on the user's emotional data and provides them in a suitable format.

[0192] A "user interface" is a means of communication within a system that receives voice input from the user and accepts information about the progress of a task.

[0193] In this embodiment, the system mainly consists of a server, a terminal, and a user.

[0194] The server receives image data of the physical surface transmitted through the image acquisition means. It analyzes the received data and uses a generative AI model to identify specific situations and depths. This makes it possible to suggest the optimal work method and equipment for each situation. The server also uses an emotion analysis engine based on the user's voice and facial expression data to evaluate the user's mental state. Based on this evaluation, the guide is adjusted through a speech synthesizer.

[0195] The device receives voice input via the user interface and receives progress information on the task. It collects user feedback and uses a profile adjustment function to personalize suggestions based on the user's sentiment data.

[0196] As a concrete example, when a user cleans their living room, the device sends image data of the floor to the server. In this case, the server uses a generated AI model to analyze the distribution of pet hair and suggests a "high-suction vacuum cleaner" as the appropriate cleaning method. The server also uses an emotion analysis engine to sense the user's stress level and uses a speech synthesizer to provide a guide message in a warm tone, such as "Let's make today a fun day."

[0197] An example of a prompt for a generative AI model is: based on "image data: living room floor" and "user voice: 'Which vacuum cleaner is best?'", it suggests an appropriate cleaning method.

[0198] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0199] Step 1:

[0200] The device captures image data of a physical surface using a camera. The input is raw image data acquired by the camera sensor. This is converted into digital data and sent to the server.

[0201] Step 2:

[0202] The server analyzes the received image data and uses a generative AI model to identify specific stains or conditions. The input is image data sent from the terminal. The generative AI model identifies the type and depth of the stain from this data and extracts the information necessary to suggest cleaning methods. The output is the type of stain as identified and a suggested cleaning method.

[0203] Step 3:

[0204] The server inputs the user's voice and facial expression data into an emotion analysis engine to evaluate the user's mental state. The input consists of the user's voice and facial expression data obtained from the terminal. The emotion analysis engine analyzes this data and outputs numerical values for emotional states such as stress levels. The output is numerical or categorical data related to the user's mental state.

[0205] Step 4:

[0206] The server uses a speech synthesizer to create personalized guides based on suggested cleaning methods and user mental state data, and outputs them as audio data. The inputs are the dirt analysis results and emotion analysis results from a generative AI model. The speech synthesizer generates an audio guide with a tone and content that takes these inputs into account, and sends it to the terminal. The output is the audio guide data.

[0207] Step 5:

[0208] The terminal plays audio guide data transmitted from the server to the user and receives feedback from the user through the user interface. The input is audio data from the server, and the output is text or audio data representing the user's responses and feedback.

[0209] Step 6:

[0210] Users provide feedback via voice or text on their device. This feedback is sent back to the server, where the profile adjustment function updates the behavioral history to help personalize future suggestions. The input is user feedback data, and the output is the updated user profile data.

[0211] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0212] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0213] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0214] [Second Embodiment]

[0215] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0216] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0217] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0218] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0219] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0220] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0221] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0222] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0223] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0224] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0225] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0226] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0227] This invention provides an autonomous system that assists users in performing cleaning tasks efficiently and effectively in the cleaning of a house. The system is configured as follows:

[0228] First, the device is equipped with a camera that can capture video of a dirty room in real time. This image acquisition means obtains video data to determine the areas that need cleaning. Next, the device sends the captured video data to a server. The server then uses a generative artificial intelligence model to analyze the video data. This analysis identifies the type and extent of the dirt and determines the appropriate cleaning method and cleaning tools to use. This information is then presented to the user as recommended cleaning methods and tools.

[0229] The server transmits the determined cleaning method to the terminal, which then uses a speech synthesizer to play the received information as audio, providing guidance to the user. This allows the user to efficiently perform the cleaning according to the suggested procedure.

[0230] As a concrete example, when the device captures video of the kitchen, the server analyzes the information to identify grease stains and, based on that information, recommends "using dish soap and a sponge, and rinsing with warm water." The device communicates this verbally, and the user can follow the instructions and begin cleaning.

[0231] Furthermore, this system includes an interface that accepts voice input from the user, allowing for reports on cleaning progress and completion status. This makes it easy for users to express their opinions regarding cleaning and receive instructions at the appropriate time. This interactive function enables more efficient cleaning work and simplifies operation.

[0232] The following describes the processing flow.

[0233] Step 1:

[0234] The device activates its camera and captures video of the designated room or area in real time. The acquired video data is stored in temporary storage.

[0235] Step 2:

[0236] The terminal compresses and converts the format of the video data stored in temporary storage in order to send it to the server, preparing it for transmission over the network.

[0237] Step 3:

[0238] The server runs a generative artificial intelligence model to analyze the received video data. This model identifies the type of dirt and determines its degree and extent.

[0239] Step 4:

[0240] Based on the dirt analysis results, the server selects the optimal cleaning method and tools from its internal database. It then organizes this selection information and prepares it for transmission to the terminal.

[0241] Step 5:

[0242] The terminal uses a speech synthesizer to convert text information into audio data in order to convert the content of the suggestions received from the server into audio format.

[0243] Step 6:

[0244] The device's voice assistant plays the converted audio data, guiding the user through specific cleaning procedures and the tools they should use.

[0245] Step 7:

[0246] The user follows the instructions of the voice assistant and actually performs the cleaning using the appropriate cleaning method. They report their progress and completion to the assistant via voice input.

[0247] Step 8:

[0248] The terminal recognizes voice input from the user, sends the content to the server to record the completion of the cleaning task, and prepares to provide the next instructions.

[0249] (Example 1)

[0250] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0251] Cleaning at home can be inefficient because it's difficult to determine which areas to clean and how. Furthermore, selecting the appropriate cleaning method and tools based on the type of dirt requires experience and knowledge, which can be a burden on the user.

[0252] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0253] In this invention, the server includes means for acquiring video footage of a contaminated area using a device equipped with an image capture device, means for analyzing the video information using a generative artificial intelligence model to identify the type and degree of contamination, and means for receiving progress reports from the user via a voice input interface. This enables the user to efficiently identify areas to be cleaned and clean them in the most optimal way.

[0254] An "image capture device" is a part of the equipment used to acquire images of a contaminated area and has the function of capturing visual information in real time.

[0255] A "generative artificial intelligence model" is an AI technology used to analyze video information and identify the type and degree of dirt, and includes algorithms that apply deep learning and machine learning.

[0256] A "voice converter" is a device or function that converts proposed cleaning methods into voice and provides guidance to the user.

[0257] A "voice input interface" is a means for users to communicate progress and completion reports to the system by voice, and includes voice acquisition devices such as microphones.

[0258] "Cleaning means" refers to methods or techniques for performing appropriate cleaning actions on identified dirt.

[0259] This invention is a system designed to assist with household cleaning tasks. The system's implementation is achieved through three main components: a terminal, a server, and a user.

[0260] First, the device uses a camera-equipped image capture device to acquire video footage of a specific area within the home. This footage is captured in real time and targets areas suspected of contamination. The device combines a high-resolution camera with data compression technology to quickly process the video and transfer it to a server.

[0261] Next, the server analyzes the video data transmitted from the terminal using a generative artificial intelligence model. The AI model used here is based on deep learning and has the ability to identify various types of dirt (e.g., oil stains, dust, water stains, etc.) that appear in the video and to evaluate their degree. After identification, the server identifies the optimal cleaning method and the equipment required for cleaning, and then forms the procedure.

[0262] The cleaning methods and procedures determined by the server are sent back to the terminal. The terminal uses a voice converter to convert this information into speech and provides the user with voice guidance on how to proceed with the cleaning. For example, for grease stains in the kitchen, instructions such as "It is recommended to use dish soap and a sponge and rinse with warm water" are provided.

[0263] Furthermore, users can report the progress and completion status of their cleaning to the server through a voice input interface. This interaction allows the server to provide the next necessary instructions at the appropriate time. In this way, users can carry out their cleaning tasks more efficiently, achieving improved overall system efficiency.

[0264] An example of a prompt message would be a question like, "Please suggest the best cleaning method for the grease stains on the kitchen floor," which is then asked to the generating AI model. This allows the user to receive cleaning instructions in a more specific and appropriate manner.

[0265] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0266] Step 1:

[0267] The device uses a camera to capture video of the entire room. In this step, video data is generated in real time. All visual information within the camera's field of view is used as input, and high-resolution video data is generated as output. This video data provides wide-ranging visual information, including areas that need cleaning.

[0268] Step 2:

[0269] The terminal compresses the acquired video data and sends it to the server via the internet. This step improves the transmission speed by reducing the size of the video data. Raw video data is required as input, and compressed video data is the output.

[0270] Step 3:

[0271] The server decompresses the received compressed video data and inputs it into a generative artificial intelligence model. In this step, the server analyzes the video data to identify the type and location of dirt in the room. Based on the input data, the AI model analyzes the data using a deep learning algorithm and outputs the type and extent of the dirt, as well as the appropriate cleaning method.

[0272] Step 4:

[0273] Based on the analysis results, the server determines specific cleaning methods and tools and generates this information in the form of prompt statements. In this step, the server scrutinizes the information based on the identified dirt and constructs a user-friendly suggestion statement. The input is output information from the AI model, and the output is a prompt statement explaining the recommended cleaning method.

[0274] Step 5:

[0275] The server sends the generated prompt message to the terminal. The input is a prompt message describing the recommended cleaning method, and the output is the data transmitted to the terminal.

[0276] Step 6:

[0277] The device converts the received recommendation information into speech and provides guidance to the user. In this step, speech synthesis technology is used to convert text data into speech data. The input is a prompt, and the output is a voice guide for the user.

[0278] Step 7:

[0279] The user reports the progress and results of cleaning using a voice input interface. In this step, the voice input from the user is converted into text data and sent to the server. The input is the user's voice, and the output is the basis for determining the next instruction.

[0280] Step 8:

[0281] Based on the progress report from the user, the server updates the next cleaning procedure, generates new instructions if necessary, and sends them to the terminal. In this step, a process of analyzing the user's progress information and generating a new prompt sentence is performed. The input is the progress information, and the output is a prompt sentence including the following recommended procedures.

[0282] (Application Example 1)

[0283] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0284] In many modern households, daily cleaning work has become a major burden. In particular, manual cleaning requires time and effort, and it may be difficult to identify dirt and select an appropriate cleaning method. Also, it is difficult to instantly determine the optimal cleaning method for different types of dirt. Furthermore, there is a demand to be able to confirm the progress and completion of the cleaning process in real time. There is a need to provide a more efficient and effective autonomous cleaning system to solve these problems.

[0285] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0286] In this invention, the server is a device equipped with imaging means, means for recording images of dirty areas, means for processing the image data and using a machine learning model to identify the type and degree of dirt, means for determining a method for presenting the optimal cleaning means and tools based on the identified dirt, means for converting the presentation into voice and providing guidance to the user with a voice processor, means installed on a cleaning device and having a patrol function to detect dirt by patrolling different locations within the home, and communication means for sending notifications to a smartphone based on the information detected by the patrol function. Thereby, cleaning within the home can be executed autonomously and efficiently, and real-time cleaning guidance and status reports can be provided to the user.

[0287] "Imaging means" refers to a device or method for recording images or pictures of an object.

[0288] "Equipment" refers to an electronic or mechanical device designed to perform a specific function.

[0289] "Processing data" means analyzing, converting, or interpreting the collected information into a useful format.

[0290] "Machine learning model" refers to a system based on an algorithm that learns from data and recognizes patterns.

[0291] "Cleaning means" refers to a method or technology used to effectively remove specific dirt.

[0292] "Tool" refers to a tool or equipment used for cleaning work.

[0293] "Voice processor" refers to a device or program for generating, converting, or outputting voice data.

[0294] "Patrol function" refers to the ability of a cleaning device to perform specific operations while moving along a set route or area.

[0295] "Communication means" refers to a channel or protocol for transmitting and receiving information.

[0296] "Notification" refers to a message or alert for transmitting specific information to a user.

[0297] To implement this invention, the following system is constructed. A server receives video data transmitted from a terminal equipped with a high-precision camera and analyzes it using a generative AI model. Through analysis on the server, the type and extent of dirt in the video are identified, and the optimal cleaning method and tools are determined. This information is converted into audio data by an audio processor and communicated to the user via a smart speaker or smartphone.

[0298] The device, acting as a household robot vacuum cleaner or similar device, patrols the room while capturing video with its camera. The collected video is transmitted to a server in real time for rapid analysis by an AI model. Cleaning instructions are sent to the user's smartphone as needed via notifications, and are also provided directly to the user as voice guidance.

[0299] For example, if the device detects a juice stain under the sofa while patrolling the living room, the server can analyze the information and provide voice guidance such as, "Please wipe the stain using juice-specific detergent and a cloth." In this way, the user can efficiently clean the area by following the suggested procedure.

[0300] A concrete example of a prompt message is, "Analyze the dirt in the captured video and suggest the best cleaning method." This prompt prompts the AI model to identify the appropriate dirt and suggest a cleaning method.

[0301] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0302] Step 1:

[0303] The device patrols the room and acquires video in real time using its camera. The input is video of the actual indoor environment, and the output is that video data. The camera inside the device captures this and processes it immediately as digital data.

[0304] Step 2:

[0305] The terminal sends the acquired video data to the server. The input is the video data processed within the terminal, and the output is the video data received within the server. The data is transferred to the server quickly and securely through the communication module of the terminal.

[0306] Step 3:

[0307] The server analyzes the received video data using a generative AI model. The input is the video data received by the server, and the output is the analysis result regarding the type and degree of dirt. The AI model within the server scans the video and identifies the characteristics of specific dirt based on internal algorithms.

[0308] Step 4:

[0309] The server determines the cleaning means and tools based on the analysis result. The input is the data regarding the type and degree of dirt, and the output is the information on the recommended cleaning method and tools. The decision-making algorithm within the server calculates the most effective cleaning method and selects appropriate tools.

[0310] Step 5:

[0311] The server converts the determined cleaning means into voice data and sends it as a notification to the terminal or smartphone. The input is the information on the cleaning method and tools, and the output is the cleaning guide in voice format. Voice synthesis technology converts the information into human language to provide direct guidance to the user.

[0312] Step 6:

[0313] The user performs the necessary cleaning operations according to the provided voice guide. The input is the received voice guide, and the output is the completed cleaning operations. Specific cleaning is carried out using the tools specified according to the voice.

[0314] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0315] This invention provides a system that incorporates an emotion engine that recognizes the user's emotions, in addition to a device equipped with image acquisition means, in order to perform cleaning work efficiently and in a user-friendly manner. Here, the emotion engine analyzes the user's emotions from voice and facial expression data and dynamically adjusts the content and presentation method of the cleaning guide.

[0316] The device first captures the dirty area with its camera and sends the video data to a server. The server uses a generative artificial intelligence model to analyze the image, identify the type and extent of the dirt, and suggest appropriate cleaning methods and tools. In addition, an emotion engine analyzes the user's voice and facial expressions in real time to determine the user's stress level and emotional state. This information is used to adjust the cleaning guide.

[0317] For example, when the device captures the living room and sends the video data to the server, the server analyzes that there is a lot of pet hair on the floor and suggests "using a vacuum cleaner with high suction power." At the same time, if the emotion engine built into the device detects that the user is feeling stressed, it adjusts the tone of the cleaning guide to be gentler and explains the steps in more detail to provide a sense of security.

[0318] Furthermore, the system includes an interface that accepts voice input from the user, forming a feedback loop by allowing them to input cleaning progress and their impressions. This loop leverages past emotional data to personalize future suggestions and guidance. Through these features, the present invention can not only streamline cleaning operations but also improve user psychological satisfaction.

[0319] The following describes the processing flow.

[0320] Step 1:

[0321] The device activates its camera and captures the designated cleaning area. This video data is temporarily stored and prepared to be sent to the server.

[0322] Step 2:

[0323] The terminal sends video data to the server. Data compression and conversion are performed before transmission, enabling efficient communication over the network.

[0324] Step 3:

[0325] The server runs a generative artificial intelligence model to analyze the received video data and identify the type and extent of dirt. Based on these results, it determines the optimal cleaning method and tools.

[0326] Step 4:

[0327] The server sends the determined cleaning method and tools to the terminal. Furthermore, it prepares to receive voice and facial expression data from the terminal in order to analyze the user's emotional data.

[0328] Step 5:

[0329] The device uses an emotion recognition engine to analyze the user's voice and facial expressions in real time. Based on the analysis results, it determines the user's emotional state and stress level.

[0330] Step 6:

[0331] The device's emotion engine adjusts the content and tone of the cleaning guide according to the user's emotional state. For example, if the user is feeling stressed, it will provide a gentler tone and more detailed explanations.

[0332] Step 7:

[0333] The terminal uses a speech synthesizer to play pre-tuned instructions to the user via voice. This allows the user to begin cleaning tasks based on the instructions.

[0334] Step 8:

[0335] Users provide feedback to the system via voice input, reporting progress and comments. This information is recorded on the device and sent to the server.

[0336] Step 9:

[0337] The server analyzes the feedback and stores it as data for future cleaning assistance. This makes it possible to provide more personalized assistance based on emotions.

[0338] (Example 2)

[0339] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0340] This invention aims not only to perform cleaning tasks efficiently but also to improve user psychological satisfaction. Conventional cleaning systems merely detect dirt and suggest cleaning methods, lacking the ability to adjust guidance that takes into account the user's emotions and stress levels. As a result, while the cleaning work itself may become more efficient, there is a problem in that the user experience does not improve.

[0341] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0342] In this invention, the server includes means for using an artificial intelligence algorithm to receive and analyze environmental information and identify abnormal conditions, means for determining a method for presenting countermeasures based on the identified abnormal conditions, and means for evaluating the user's psychological state using an emotion processing device. This makes it possible to dynamically adjust and provide a personalized cleaning guide tailored to the user.

[0343] "Environmental information" refers to data that the device acquires through its imaging equipment, indicating the surrounding environment and conditions.

[0344] "Photography equipment" refers to devices used to acquire still images and videos, such as cameras and sensors.

[0345] An "artificial intelligence algorithm" refers to a calculation method used to analyze acquired data and derive specific results or judgments.

[0346] An "abnormal condition" refers to a state or situation that deviates from the normal range, and includes dirt and disorder that require cleaning.

[0347] "Solution method" refers to a method that provides the optimal solution or countermeasure for an identified abnormal condition.

[0348] "Equipment used" refers to the tools and equipment necessary to implement the countermeasures.

[0349] An "emotion processing device" is a device or program that analyzes a user's voice and facial expression data to determine their psychological state.

[0350] "Information transmission medium" refers to a means of conveying messages, such as audio and video, to users.

[0351] "Synthesized speech output" refers to a technology that converts text information into speech and provides it to the user as an audio guide.

[0352] The "personalized processing function" is a function that adjusts the content according to the user's status and preferences, providing optimized information.

[0353] Embodiments of the present invention will be described in detail below.

[0354] The system related to this invention is designed to make cleaning work efficient and user-friendly. Specifically, it employs a system in which a terminal acquires physical environmental information, and a server is responsible for processing the data based on that information. The entire system is composed of an image acquisition means, an emotion engine, and a generative AI model as its core components.

[0355] The image acquisition device used is a high-resolution camera mounted on the terminal. This camera captures an overall image of the room and objects, including any dirt, and transmits the details as image data to the server. Wireless communication technology is used for data transmission, ensuring secure data transfer.

[0356] The server analyzes the received image data using a generative AI model. Specifically, the AI algorithm identifies the location, type, and extent of dirt from the video. Based on this analysis, the server suggests the optimal cleaning method and cleaning tools. For example, if the processed data determines that there is a lot of pet hair on the floor, it will suggest a specific solution such as "use a vacuum cleaner with high suction power."

[0357] The device is equipped with an emotion engine to capture the user's voice and facial expressions. This emotion engine aims to understand the user's emotional state and stress level, thereby adjusting the system's guidance content and presentation methods in real time. For example, if the user is feeling stressed, the guidance tone will be set to be gentler, and step-by-step explanations will be more detailed.

[0358] For example, imagine a user attempting to clean their living room. The device can capture an image of the entire room, and the server can detect the presence of pet hair. This would then recommend the use of a vacuum cleaner with high suction power. Simultaneously, the device's emotion engine would detect the user's stress level and adjust the tone of the guidance accordingly.

[0359] Examples of prompts that can be input to the generating AI model include "Analyze the cleanliness of the room and suggest cleaning methods" and "Recognize the user's emotions and adjust the guidance accordingly."

[0360] A key feature of this system is that the user's emotional state is incorporated into the feedback loop, allowing subsequent guidance to be more personalized. Thus, this invention not only improves cleaning efficiency but also enhances psychological satisfaction.

[0361] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0362] Step 1:

[0363] The terminal captures video footage of the room via a high-resolution camera. The captured video data is transmitted to the server via wireless communication. The input is video data of the room, and the output is a digital image file sent to the server. In this step, the terminal acts as a video capture device, acquiring detailed data on the area to be cleaned.

[0364] Step 2:

[0365] The server analyzes the received video data using a generating AI model. The input is video data received from the terminal, and the output is the analysis results regarding the location, type, and extent of the dirt. In this step, the server processes the data and applies AI algorithms to identify specific types of dirt. For example, it can identify pet hair on the floor or dirt on windows.

[0366] Step 3:

[0367] The server determines the appropriate cleaning method and cleaning tools to use based on the analysis results. The input is the dirt analysis results, and the output is suggested information for the user. The server performs data calculations and generates specific advice, such as "You should use a vacuum cleaner with high suction power."

[0368] Step 4:

[0369] The device inputs the user's voice and facial expressions into an emotion engine in real time to analyze their emotional state. The input is the user's voice and facial expression data, and the output is the analysis results regarding their psychological state. The device determines the user's stress level and uses this information to adjust the guidance for the next step.

[0370] Step 5:

[0371] The server dynamically adjusts the content and tone of the cleaning guide provided to the user based on the results of an analysis of their emotional state. The input is the results of the emotional state analysis and cleaning suggestions, and the output is the adjusted guide information. In this way, the server customizes the guide to match the user's psychological state.

[0372] Step 6:

[0373] Users input cleaning progress and feedback through the terminal's voice interface. Input is voice data, and output is feedback information within the system. User feedback is used to improve future suggestions and guides.

[0374] These steps enable the system to clean efficiently while also improving user satisfaction by considering their feelings.

[0375] (Application Example 2)

[0376] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0377] Conventional automatic cleaning systems have difficulty detecting dirt and suggesting appropriate cleaning methods, and furthermore, they cannot take into account the user's feelings, resulting in a lack of improvement in user satisfaction.

[0378] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0379] In this invention, the server includes means for using an artificial intelligence model that analyzes image data and identifies specific situations, means incorporating an emotion analysis engine that analyzes the user's voice and facial expressions and evaluates their mental state, and means for a synthesizer that adjusts the guide based on the evaluation and provides it in an individualized format. This makes it possible to suggest the optimal cleaning method according to the situation and to improve the user experience by taking into consideration the user's mental state.

[0380] "Image acquisition means" refers to a device or function that captures image data of a physical surface.

[0381] An "artificial intelligence model" is an algorithm or data-driven computer program used to analyze image data and identify specific situations or depths.

[0382] An "emotion analysis engine" is a function or software that analyzes the user's voice and facial expressions to evaluate their mental state.

[0383] A "speech synthesizer" is a device or function that converts text data into speech to provide guidance and information to the user.

[0384] The "profile adjustment function" is an adjustment function that personalizes suggested methods and content based on the user's emotional data and provides them in a suitable format.

[0385] A "user interface" is a means of communication within a system that receives voice input from the user and accepts information about the progress of a task.

[0386] In this embodiment, the system mainly consists of a server, a terminal, and a user.

[0387] The server receives image data of the physical surface transmitted through the image acquisition means. It analyzes the received data and uses a generative AI model to identify specific situations and depths. This makes it possible to suggest the optimal work method and equipment for each situation. The server also uses an emotion analysis engine based on the user's voice and facial expression data to evaluate the user's mental state. Based on this evaluation, the guide is adjusted through a speech synthesizer.

[0388] The device receives voice input via the user interface and receives progress information on the task. It collects user feedback and uses a profile adjustment function to personalize suggestions based on the user's sentiment data.

[0389] As a concrete example, when a user cleans their living room, the device sends image data of the floor to the server. In this case, the server uses a generated AI model to analyze the distribution of pet hair and suggests a "high-suction vacuum cleaner" as the appropriate cleaning method. The server also uses an emotion analysis engine to sense the user's stress level and uses a speech synthesizer to provide a guide message in a warm tone, such as "Let's make today a fun day."

[0390] An example of a prompt for a generative AI model is: based on "image data: living room floor" and "user voice: 'Which vacuum cleaner is best?'", it suggests an appropriate cleaning method.

[0391] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0392] Step 1:

[0393] The device captures image data of a physical surface using a camera. The input is raw image data acquired by the camera sensor. This is converted into digital data and sent to the server.

[0394] Step 2:

[0395] The server analyzes the received image data and uses a generative AI model to identify specific stains or conditions. The input is image data sent from the terminal. The generative AI model identifies the type and depth of the stain from this data and extracts the information necessary to suggest cleaning methods. The output is the type of stain as identified and a suggested cleaning method.

[0396] Step 3:

[0397] The server inputs the user's voice and facial expression data into an emotion analysis engine to evaluate the user's mental state. The input consists of the user's voice and facial expression data obtained from the terminal. The emotion analysis engine analyzes this data and outputs numerical values for emotional states such as stress levels. The output is numerical or categorical data related to the user's mental state.

[0398] Step 4:

[0399] The server uses a speech synthesizer to create personalized guides based on suggested cleaning methods and user mental state data, and outputs them as audio data. The inputs are the dirt analysis results and emotion analysis results from a generative AI model. The speech synthesizer generates an audio guide with a tone and content that takes these inputs into account, and sends it to the terminal. The output is the audio guide data.

[0400] Step 5:

[0401] The terminal plays audio guide data transmitted from the server to the user and receives feedback from the user through the user interface. The input is audio data from the server, and the output is text or audio data representing the user's responses and feedback.

[0402] Step 6:

[0403] Users provide feedback via voice or text on their device. This feedback is sent back to the server, where the profile adjustment function updates the behavioral history to help personalize future suggestions. The input is user feedback data, and the output is the updated user profile data.

[0404] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0405] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0406] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0407] [Third Embodiment]

[0408] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0409] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0410] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0411] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0412] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0413] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0414] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0415] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0416] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0417] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0418] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0419] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0420] This invention provides an autonomous system that assists users in performing cleaning tasks efficiently and effectively in the cleaning of a house. The system is configured as follows:

[0421] First, the device is equipped with a camera that can capture video of a dirty room in real time. This image acquisition means obtains video data to determine the areas that need cleaning. Next, the device sends the captured video data to a server. The server then uses a generative artificial intelligence model to analyze the video data. This analysis identifies the type and extent of the dirt and determines the appropriate cleaning method and cleaning tools to use. This information is then presented to the user as recommended cleaning methods and tools.

[0422] The server transmits the determined cleaning method to the terminal, which then uses a speech synthesizer to play the received information as audio, providing guidance to the user. This allows the user to efficiently perform the cleaning according to the suggested procedure.

[0423] As a concrete example, when the device captures video of the kitchen, the server analyzes the information to identify grease stains and, based on that information, recommends "using dish soap and a sponge, and rinsing with warm water." The device communicates this verbally, and the user can follow the instructions and begin cleaning.

[0424] Furthermore, this system includes an interface that accepts voice input from the user, allowing for reports on cleaning progress and completion status. This makes it easy for users to express their opinions regarding cleaning and receive instructions at the appropriate time. This interactive function enables more efficient cleaning work and simplifies operation.

[0425] The following describes the processing flow.

[0426] Step 1:

[0427] The device activates its camera and captures video of the designated room or area in real time. The acquired video data is stored in temporary storage.

[0428] Step 2:

[0429] The terminal compresses and converts the format of the video data stored in temporary storage in order to send it to the server, preparing it for transmission over the network.

[0430] Step 3:

[0431] The server runs a generative artificial intelligence model to analyze the received video data. This model identifies the type of dirt and determines its degree and extent.

[0432] Step 4:

[0433] Based on the dirt analysis results, the server selects the optimal cleaning method and tools from its internal database. It then organizes this selection information and prepares it for transmission to the terminal.

[0434] Step 5:

[0435] The terminal uses a speech synthesizer to convert text information into audio data in order to convert the content of the suggestions received from the server into audio format.

[0436] Step 6:

[0437] The device's voice assistant plays the converted audio data, guiding the user through specific cleaning procedures and the tools they should use.

[0438] Step 7:

[0439] The user follows the instructions of the voice assistant and actually performs the cleaning using the appropriate cleaning method. They report their progress and completion to the assistant via voice input.

[0440] Step 8:

[0441] The terminal recognizes voice input from the user, sends the content to the server to record the completion of the cleaning task, and prepares to provide the next instructions.

[0442] (Example 1)

[0443] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0444] Cleaning at home can be inefficient because it's difficult to determine which areas to clean and how. Furthermore, selecting the appropriate cleaning method and tools based on the type of dirt requires experience and knowledge, which can be a burden on the user.

[0445] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0446] In this invention, the server includes means for acquiring video footage of a contaminated area using a device equipped with an image capture device, means for analyzing the video information using a generative artificial intelligence model to identify the type and degree of contamination, and means for receiving progress reports from the user via a voice input interface. This enables the user to efficiently identify areas to be cleaned and clean them in the most optimal way.

[0447] An "image capture device" is a part of the equipment used to acquire images of a contaminated area and has the function of capturing visual information in real time.

[0448] A "generative artificial intelligence model" is an AI technology used to analyze video information and identify the type and degree of dirt, and includes algorithms that apply deep learning and machine learning.

[0449] A "voice converter" is a device or function that converts proposed cleaning methods into voice and provides guidance to the user.

[0450] A "voice input interface" is a means for users to communicate progress and completion reports to the system by voice, and includes voice acquisition devices such as microphones.

[0451] "Cleaning means" refers to methods or techniques for performing appropriate cleaning actions on identified dirt.

[0452] This invention is a system designed to assist with household cleaning tasks. The system's implementation is achieved through three main components: a terminal, a server, and a user.

[0453] First, the device uses a camera-equipped image capture device to acquire video footage of a specific area within the home. This footage is captured in real time and targets areas suspected of contamination. The device combines a high-resolution camera with data compression technology to quickly process the video and transfer it to a server.

[0454] Next, the server analyzes the video data transmitted from the terminal using a generative artificial intelligence model. The AI model used here is based on deep learning and has the ability to identify various types of dirt (e.g., oil stains, dust, water stains, etc.) that appear in the video and to evaluate their degree. After identification, the server identifies the optimal cleaning method and the equipment required for cleaning, and then forms the procedure.

[0455] The cleaning methods and procedures determined by the server are sent back to the terminal. The terminal uses a voice converter to convert this information into speech and provides the user with voice guidance on how to proceed with the cleaning. For example, for grease stains in the kitchen, instructions such as "It is recommended to use dish soap and a sponge and rinse with warm water" are provided.

[0456] Furthermore, users can report the progress and completion status of their cleaning to the server through a voice input interface. This interaction allows the server to provide the next necessary instructions at the appropriate time. In this way, users can carry out their cleaning tasks more efficiently, achieving improved overall system efficiency.

[0457] An example of a prompt message would be a question like, "Please suggest the best cleaning method for the grease stains on the kitchen floor," which is then asked to the generating AI model. This allows the user to receive cleaning instructions in a more specific and appropriate manner.

[0458] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0459] Step 1:

[0460] The device uses a camera to capture video of the entire room. In this step, video data is generated in real time. All visual information within the camera's field of view is used as input, and high-resolution video data is generated as output. This video data provides wide-ranging visual information, including areas that need cleaning.

[0461] Step 2:

[0462] The terminal compresses the acquired video data and sends it to the server via the internet. This step improves the transmission speed by reducing the size of the video data. Raw video data is required as input, and compressed video data is the output.

[0463] Step 3:

[0464] The server decompresses the received compressed video data and inputs it into a generative artificial intelligence model. In this step, the server analyzes the video data to identify the type and location of dirt in the room. Based on the input data, the AI model analyzes the data using a deep learning algorithm and outputs the type and extent of the dirt, as well as the appropriate cleaning method.

[0465] Step 4:

[0466] Based on the analysis results, the server determines specific cleaning methods and tools and generates this information in the form of prompt statements. In this step, the server scrutinizes the information based on the identified dirt and constructs a user-friendly suggestion statement. The input is output information from the AI model, and the output is a prompt statement explaining the recommended cleaning method.

[0467] Step 5:

[0468] The server sends the generated prompt message to the terminal. The input is a prompt message describing the recommended cleaning method, and the output is the data transmitted to the terminal.

[0469] Step 6:

[0470] The device converts the received recommendation information into speech and provides guidance to the user. In this step, speech synthesis technology is used to convert text data into speech data. The input is a prompt, and the output is a voice guide for the user.

[0471] Step 7:

[0472] The user reports the progress and results of cleaning using a voice input interface. In this step, the voice input from the user is converted into text data and sent to the server. The input is the user's voice, and the output is the basis for determining the next instruction.

[0473] Step 8:

[0474] Based on the user's progress report, the server updates the next cleaning procedure, generates new instructions as needed, and sends them to the terminal. This step involves analyzing the user's progress information and generating a new prompt message. The input is progress information, and the output is a prompt message containing the next recommended procedure.

[0475] (Application Example 1)

[0476] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0477] In many modern homes, routine cleaning is a significant burden. Manual cleaning, in particular, is time-consuming and labor-intensive, and identifying the right dirt and selecting the appropriate cleaning method can be difficult. Furthermore, instantly determining the optimal cleaning method for different types of dirt is challenging. There is also a need to monitor the progress and completion of the cleaning process in real time. To address these issues, there is a demand for more efficient and effective autonomous cleaning systems.

[0478] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0479] In this invention, the server is a device equipped with imaging means, which includes means for recording images of a dirty area, means for processing the image data and using a machine learning model to identify the type and degree of dirt, means for determining a method for suggesting the optimal cleaning means and tools based on the identified dirt, means for a voice processor that converts the suggestion into voice and provides guidance to the user, means mounted on a cleaning device and equipped with a patrol function that travels to different locations in the home to detect dirt, and communication means that sends a notification to a smartphone based on the information detected by the patrol function. This enables autonomous and efficient cleaning of the home and allows for real-time cleaning guidance and status reports to the user.

[0480] "Imaging means" refers to a device or method for recording images or videos of an object.

[0481] An "device" is an electronic or mechanical device designed to perform a specific function.

[0482] "Processing data" means analyzing, transforming, or interpreting collected information to make it a useful format.

[0483] A "machine learning model" is a system based on an algorithm that learns from data and recognizes patterns.

[0484] "Cleaning means" refers to a method or technique used to effectively remove a specific type of dirt.

[0485] "Tools" refer to the tools and equipment used to perform cleaning tasks.

[0486] A "speech processor" is a device or program for generating, converting, or outputting speech data.

[0487] "Patrol function" refers to the ability of cleaning equipment to perform specific tasks while moving along a set route or area.

[0488] "Communication methods" refer to channels and protocols used to send and receive information.

[0489] A "notification" is a message or alert used to convey specific information to a user.

[0490] To implement this invention, the following system is constructed. A server receives video data transmitted from a terminal equipped with a high-precision camera and analyzes it using a generative AI model. Through analysis on the server, the type and extent of dirt in the video are identified, and the optimal cleaning method and tools are determined. This information is converted into audio data by an audio processor and communicated to the user via a smart speaker or smartphone.

[0491] The device, acting as a household robot vacuum cleaner or similar device, patrols the room while capturing video with its camera. The collected video is transmitted to a server in real time for rapid analysis by an AI model. Cleaning instructions are sent to the user's smartphone as needed via notifications, and are also provided directly to the user as voice guidance.

[0492] For example, if the device detects a juice stain under the sofa while patrolling the living room, the server can analyze the information and provide voice guidance such as, "Please wipe the stain using juice-specific detergent and a cloth." In this way, the user can efficiently clean the area by following the suggested procedure.

[0493] A concrete example of a prompt message is, "Analyze the dirt in the captured video and suggest the best cleaning method." This prompt prompts the AI model to identify the appropriate dirt and suggest a cleaning method.

[0494] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0495] Step 1:

[0496] The device patrols the room and acquires video in real time using its camera. The input is video of the actual indoor environment, and the output is that video data. The camera inside the device captures this and processes it immediately as digital data.

[0497] Step 2:

[0498] The terminal transmits the acquired video data to the server. The input is the video data processed within the terminal, and the output is the video data received by the server. The data is transferred to the server quickly and securely through the terminal's communication module.

[0499] Step 3:

[0500] The server analyzes the received video data using a generating AI model. The input is the video data received by the server, and the output is the analysis results regarding the type and degree of dirt. The AI model within the server scans the video and identifies specific dirt features based on its internal algorithm.

[0501] Step 4:

[0502] The server determines cleaning methods and tools based on the analysis results. The input is data on the type and degree of dirt, and the output is information on recommended cleaning methods and tools. The decision algorithm within the server calculates the most effective cleaning method and selects the appropriate tools.

[0503] Step 5:

[0504] The server converts the determined cleaning method into audio data and sends it as a notification to the terminal or smartphone. The input is information about the cleaning method and tools, and the output is a cleaning guide in audio format. Speech synthesis technology converts the information into human language, providing direct guidance to the user.

[0505] Step 6:

[0506] The user performs the necessary cleaning tasks according to the provided audio guide. The input is the received audio guide, and the output is the completed cleaning task. The user performs specific cleaning tasks using the tools specified by the audio guide.

[0507] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0508] This invention provides a system that incorporates an emotion engine that recognizes the user's emotions, in addition to a device equipped with image acquisition means, in order to perform cleaning work efficiently and in a user-friendly manner. Here, the emotion engine analyzes the user's emotions from voice and facial expression data and dynamically adjusts the content and presentation method of the cleaning guide.

[0509] The device first captures the dirty area with its camera and sends the video data to a server. The server uses a generative artificial intelligence model to analyze the image, identify the type and extent of the dirt, and suggest appropriate cleaning methods and tools. In addition, an emotion engine analyzes the user's voice and facial expressions in real time to determine the user's stress level and emotional state. This information is used to adjust the cleaning guide.

[0510] For example, when the device captures the living room and sends the video data to the server, the server analyzes that there is a lot of pet hair on the floor and suggests "using a vacuum cleaner with high suction power." At the same time, if the emotion engine built into the device detects that the user is feeling stressed, it adjusts the tone of the cleaning guide to be gentler and explains the steps in more detail to provide a sense of security.

[0511] Furthermore, the system includes an interface that accepts voice input from the user, forming a feedback loop by allowing them to input cleaning progress and their impressions. This loop leverages past emotional data to personalize future suggestions and guidance. Through these features, the present invention can not only streamline cleaning operations but also improve user psychological satisfaction.

[0512] The following describes the processing flow.

[0513] Step 1:

[0514] The device activates its camera and captures the designated cleaning area. This video data is temporarily stored and prepared to be sent to the server.

[0515] Step 2:

[0516] The terminal sends video data to the server. Data compression and conversion are performed before transmission, enabling efficient communication over the network.

[0517] Step 3:

[0518] The server runs a generated artificial intelligence model to analyze the received video data and identify the type and extent of dirt. Based on these results, it determines the optimal cleaning method and tools.

[0519] Step 4:

[0520] The server sends the determined cleaning method and tools to the terminal. Furthermore, it prepares to receive voice and facial expression data from the terminal in order to analyze the user's emotional data.

[0521] Step 5:

[0522] The device uses an emotion recognition engine to analyze the user's voice and facial expressions in real time. Based on the analysis results, it determines the user's emotional state and stress level.

[0523] Step 6:

[0524] The device's emotion engine adjusts the content and tone of the cleaning guide according to the user's emotional state. For example, if the user is feeling stressed, it will provide a gentler tone and more detailed explanations.

[0525] Step 7:

[0526] The terminal uses a speech synthesizer to play pre-tuned instructions to the user via voice. This allows the user to begin cleaning tasks based on the instructions.

[0527] Step 8:

[0528] Users provide feedback to the system via voice input, reporting progress and comments. This information is recorded on the device and sent to the server.

[0529] Step 9:

[0530] The server analyzes the feedback and stores it as data for future cleaning assistance. This makes it possible to provide more personalized assistance based on emotions.

[0531] (Example 2)

[0532] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0533] This invention aims not only to perform cleaning tasks efficiently but also to improve user psychological satisfaction. Conventional cleaning systems merely detect dirt and suggest cleaning methods, lacking the ability to adjust guidance that takes into account the user's emotions and stress levels. As a result, while the cleaning work itself may become more efficient, there is a problem in that the user experience does not improve.

[0534] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0535] In this invention, the server includes means for using an artificial intelligence algorithm to receive and analyze environmental information and identify abnormal conditions, means for determining a method for presenting countermeasures based on the identified abnormal conditions, and means for evaluating the user's psychological state using an emotion processing device. This makes it possible to dynamically adjust and provide a personalized cleaning guide tailored to the user.

[0536] "Environmental information" refers to data that the device acquires through its imaging equipment, indicating the surrounding environment and conditions.

[0537] "Photography equipment" refers to devices used to acquire still images and videos, such as cameras and sensors.

[0538] An "artificial intelligence algorithm" refers to a calculation method used to analyze acquired data and derive specific results or judgments.

[0539] An "abnormal condition" refers to a state or situation that deviates from the normal range, and includes dirt and disorder that require cleaning.

[0540] "Solution method" refers to a method that provides the optimal solution or countermeasure for an identified abnormal condition.

[0541] "Equipment used" refers to the tools and equipment necessary to implement the countermeasures.

[0542] An "emotion processing device" is a device or program that analyzes a user's voice and facial expression data to determine their psychological state.

[0543] "Information transmission medium" refers to a means of conveying messages, such as audio and video, to users.

[0544] "Synthesized speech output" refers to a technology that converts text information into speech and provides it to the user as an audio guide.

[0545] The "personalized processing function" is a function that adjusts the content according to the user's status and preferences, providing optimized information.

[0546] Embodiments of the present invention will be described in detail below.

[0547] The system related to this invention is designed to make cleaning work efficient and user-friendly. Specifically, it employs a system in which a terminal acquires physical environmental information, and a server is responsible for processing the data based on that information. The entire system is composed of an image acquisition means, an emotion engine, and a generative AI model as its core components.

[0548] The image acquisition device used is a high-resolution camera mounted on the terminal. This camera captures an overall image of the room and objects, including any dirt, and transmits the details as image data to the server. Wireless communication technology is used for data transmission, ensuring secure data transfer.

[0549] The server analyzes the received image data using a generative AI model. Specifically, the AI algorithm identifies the location, type, and extent of dirt from the video. Based on this analysis, the server suggests the optimal cleaning method and cleaning tools. For example, if the processed data determines that there is a lot of pet hair on the floor, it will suggest a specific solution such as "use a vacuum cleaner with high suction power."

[0550] The device is equipped with an emotion engine to capture the user's voice and facial expressions. This emotion engine aims to understand the user's emotional state and stress level, thereby adjusting the system's guidance content and presentation methods in real time. For example, if the user is feeling stressed, the guidance tone will be set to be gentler, and step-by-step explanations will be more detailed.

[0551] For example, imagine a user attempting to clean their living room. The device can capture an image of the entire room, and the server can detect the presence of pet hair. This would then recommend the use of a vacuum cleaner with high suction power. Simultaneously, the device's emotion engine would detect the user's stress level and adjust the tone of the guidance accordingly.

[0552] Examples of prompts that can be input to the generating AI model include "Analyze the cleanliness of the room and suggest cleaning methods" and "Recognize the user's emotions and adjust the guidance accordingly."

[0553] A key feature of this system is that the user's emotional state is incorporated into the feedback loop, allowing subsequent guidance to be more personalized. Thus, this invention not only improves cleaning efficiency but also enhances psychological satisfaction.

[0554] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0555] Step 1:

[0556] The terminal captures video footage of the room via a high-resolution camera. The captured video data is transmitted to the server via wireless communication. The input is video data of the room, and the output is a digital image file sent to the server. In this step, the terminal acts as a video capture device, acquiring detailed data on the area to be cleaned.

[0557] Step 2:

[0558] The server analyzes the received video data using a generating AI model. The input is video data received from the terminal, and the output is the analysis results regarding the location, type, and extent of the dirt. In this step, the server processes the data and applies AI algorithms to identify specific types of dirt. For example, it can identify pet hair on the floor or dirt on windows.

[0559] Step 3:

[0560] The server determines the appropriate cleaning method and cleaning tools to use based on the analysis results. The input is the dirt analysis results, and the output is suggested information for the user. The server performs data calculations and generates specific advice, such as "You should use a vacuum cleaner with high suction power."

[0561] Step 4:

[0562] The device inputs the user's voice and facial expressions into an emotion engine in real time to analyze their emotional state. The input is the user's voice and facial expression data, and the output is the analysis results regarding their psychological state. The device determines the user's stress level and uses this information to adjust the guidance for the next step.

[0563] Step 5:

[0564] The server dynamically adjusts the content and tone of the cleaning guide provided to the user based on the results of an analysis of their emotional state. The input is the results of the emotional state analysis and cleaning suggestions, and the output is the adjusted guide information. In this way, the server customizes the guide to match the user's psychological state.

[0565] Step 6:

[0566] Users input cleaning progress and feedback through the terminal's voice interface. Input is voice data, and output is feedback information within the system. User feedback is used to improve future suggestions and guides.

[0567] These steps enable the system to clean efficiently while also improving user satisfaction by considering their feelings.

[0568] (Application Example 2)

[0569] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0570] Conventional automatic cleaning systems have difficulty detecting dirt and suggesting appropriate cleaning methods, and furthermore, they cannot take into account the user's feelings, resulting in a lack of improvement in user satisfaction.

[0571] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0572] In this invention, the server includes means for using an artificial intelligence model that analyzes image data and identifies specific situations, means incorporating an emotion analysis engine that analyzes the user's voice and facial expressions and evaluates their mental state, and means for a synthesizer that adjusts the guide based on the evaluation and provides it in an individualized format. This makes it possible to suggest the optimal cleaning method according to the situation and to improve the user experience by taking into consideration the user's mental state.

[0573] "Image acquisition means" refers to a device or function that captures image data of a physical surface.

[0574] An "artificial intelligence model" is an algorithm or data-driven computer program used to analyze image data and identify specific situations or depths.

[0575] An "emotion analysis engine" is a function or software that analyzes the user's voice and facial expressions to evaluate their mental state.

[0576] A "speech synthesizer" is a device or function that converts text data into speech to provide guidance and information to the user.

[0577] The "profile adjustment function" is an adjustment function that personalizes suggested methods and content based on the user's emotional data and provides them in a suitable format.

[0578] A "user interface" is a means of communication within a system that receives voice input from the user and accepts information about the progress of a task.

[0579] In this embodiment, the system mainly consists of a server, a terminal, and a user.

[0580] The server receives image data of the physical surface transmitted through the image acquisition means. It analyzes the received data and uses a generative AI model to identify specific situations and depths. This makes it possible to suggest the optimal work method and equipment for each situation. The server also uses an emotion analysis engine based on the user's voice and facial expression data to evaluate the user's mental state. Based on this evaluation, the guide is adjusted through a speech synthesizer.

[0581] The device receives voice input via the user interface and receives information on the progress of the task. It collects user feedback and uses a profile adjustment function to personalize suggestions based on the user's sentiment data.

[0582] As a concrete example, when a user cleans their living room, the device sends image data of the floor to the server. In this case, the server uses a generated AI model to analyze the distribution of pet hair and suggests a "high-suction vacuum cleaner" as the appropriate cleaning method. The server also uses an emotion analysis engine to sense the user's stress level and uses a speech synthesizer to provide a guide message in a warm tone, such as "Let's make today a fun day."

[0583] An example of a prompt for a generative AI model is: based on "image data: living room floor" and "user voice: 'Which vacuum cleaner is best?'", it suggests an appropriate cleaning method.

[0584] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0585] Step 1:

[0586] The device captures image data of a physical surface using a camera. The input is raw image data acquired by the camera sensor. This is converted into digital data and sent to the server.

[0587] Step 2:

[0588] The server analyzes the received image data and uses a generative AI model to identify specific stains or conditions. The input is image data sent from the terminal. The generative AI model identifies the type and depth of the stain from this data and extracts the information necessary to suggest cleaning methods. The output is the type of stain as identified and a suggested cleaning method.

[0589] Step 3:

[0590] The server inputs the user's voice and facial expression data into an emotion analysis engine to evaluate the user's mental state. The input consists of the user's voice and facial expression data obtained from the terminal. The emotion analysis engine analyzes this data and outputs numerical values for emotional states such as stress levels. The output is numerical or categorical data related to the user's mental state.

[0591] Step 4:

[0592] The server uses a speech synthesizer to create personalized guides based on suggested cleaning methods and user mental state data, and outputs them as audio data. The inputs are the dirt analysis results and emotion analysis results from a generative AI model. The speech synthesizer generates an audio guide with a tone and content that takes these inputs into account, and sends it to the terminal. The output is the audio guide data.

[0593] Step 5:

[0594] The terminal plays audio guide data transmitted from the server to the user and receives feedback from the user through the user interface. The input is audio data from the server, and the output is text or audio data representing the user's responses and feedback.

[0595] Step 6:

[0596] Users provide feedback via voice or text on their device. This feedback is sent back to the server, where the profile adjustment function updates the behavioral history to help personalize future suggestions. The input is user feedback data, and the output is the updated user profile data.

[0597] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0598] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0599] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0600] [Fourth Embodiment]

[0601] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0602] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0603] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0604] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0605] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0606] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0607] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0608] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0609] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0610] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0611] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0612] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0613] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0614] This invention provides an autonomous system that assists users in performing cleaning tasks efficiently and effectively in the cleaning of a house. The system is configured as follows:

[0615] First, the device is equipped with a camera that can capture video of a dirty room in real time. This image acquisition means obtains video data to determine the areas that need cleaning. Next, the device sends the captured video data to a server. The server then uses a generative artificial intelligence model to analyze the video data. This analysis identifies the type and extent of the dirt and determines the appropriate cleaning method and cleaning tools to use. This information is then presented to the user as recommended cleaning methods and tools.

[0616] The server transmits the determined cleaning method to the terminal, which then uses a speech synthesizer to play the received information as audio, providing guidance to the user. This allows the user to efficiently perform the cleaning according to the suggested procedure.

[0617] As a concrete example, when the device captures video of the kitchen, the server analyzes the information to identify grease stains and, based on that information, recommends "using dish soap and a sponge, and rinsing with warm water." The device communicates this verbally, and the user can follow the instructions and begin cleaning.

[0618] Furthermore, this system includes an interface that accepts voice input from the user, allowing for reports on cleaning progress and completion status. This makes it easy for users to express their opinions regarding cleaning and receive instructions at the appropriate time. This interactive function enables more efficient cleaning work and simplifies operation.

[0619] The following describes the processing flow.

[0620] Step 1:

[0621] The device activates its camera and captures video of the designated room or area in real time. The acquired video data is stored in temporary storage.

[0622] Step 2:

[0623] The terminal compresses and converts the format of the video data stored in temporary storage in order to send it to the server, preparing it for transmission over the network.

[0624] Step 3:

[0625] The server runs a generative artificial intelligence model to analyze the received video data. This model identifies the type of dirt and determines its degree and extent.

[0626] Step 4:

[0627] Based on the dirt analysis results, the server selects the optimal cleaning method and tools from its internal database. It then organizes this selection information and prepares it for transmission to the terminal.

[0628] Step 5:

[0629] The terminal uses a speech synthesizer to convert text information into audio data in order to convert the content of the suggestions received from the server into audio format.

[0630] Step 6:

[0631] The device's voice assistant plays the converted audio data, guiding the user through specific cleaning procedures and the tools they should use.

[0632] Step 7:

[0633] The user follows the instructions of the voice assistant and actually performs the cleaning using the appropriate cleaning method. They report their progress and completion to the assistant via voice input.

[0634] Step 8:

[0635] The terminal recognizes voice input from the user, sends the content to the server to record the completion of the cleaning task, and prepares to provide the next instructions.

[0636] (Example 1)

[0637] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0638] Cleaning at home can be inefficient because it's difficult to determine which areas to clean and how. Furthermore, selecting the appropriate cleaning method and tools based on the type of dirt requires experience and knowledge, which can be a burden on the user.

[0639] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0640] In this invention, the server includes means for acquiring video footage of a contaminated area using a device equipped with an image capture device, means for analyzing the video information using a generative artificial intelligence model to identify the type and degree of contamination, and means for receiving progress reports from the user via a voice input interface. This enables the user to efficiently identify areas to be cleaned and clean them in the most optimal way.

[0641] An "image capture device" is a part of the equipment used to acquire images of a contaminated area and has the function of capturing visual information in real time.

[0642] A "generative artificial intelligence model" is an AI technology used to analyze video information and identify the type and degree of dirt, and includes algorithms that apply deep learning and machine learning.

[0643] A "voice converter" is a device or function that converts proposed cleaning methods into voice and provides guidance to the user.

[0644] A "voice input interface" is a means for users to communicate progress and completion reports to the system by voice, and includes voice acquisition devices such as microphones.

[0645] "Cleaning means" refers to methods or techniques for performing appropriate cleaning actions on identified dirt.

[0646] This invention is a system designed to assist with household cleaning tasks. The system's implementation is achieved through three main components: a terminal, a server, and a user.

[0647] First, the device uses a camera-equipped image capture device to acquire video footage of a specific area within the home. This footage is captured in real time and targets areas suspected of contamination. The device combines a high-resolution camera with data compression technology to quickly process the video and transfer it to a server.

[0648] Next, the server analyzes the video data transmitted from the terminal using a generative artificial intelligence model. The AI model used here is based on deep learning and has the ability to identify various types of dirt (e.g., oil stains, dust, water stains, etc.) that appear in the video and to evaluate their degree. After identification, the server identifies the optimal cleaning method and the equipment required for cleaning, and then forms the procedure.

[0649] The cleaning methods and procedures determined by the server are sent back to the terminal. The terminal uses a voice converter to convert this information into speech and provides the user with voice guidance on how to proceed with the cleaning. For example, for grease stains in the kitchen, instructions such as "It is recommended to use dish soap and a sponge and rinse with warm water" are provided.

[0650] Furthermore, users can report the progress and completion status of their cleaning to the server through a voice input interface. This interaction allows the server to provide the next necessary instructions at the appropriate time. In this way, users can carry out their cleaning tasks more efficiently, achieving improved overall system efficiency.

[0651] An example of a prompt message would be a question like, "Please suggest the best cleaning method for the grease stains on the kitchen floor," which is then asked to the generating AI model. This allows the user to receive cleaning instructions in a more specific and appropriate manner.

[0652] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0653] Step 1:

[0654] The device uses a camera to capture video of the entire room. In this step, video data is generated in real time. All visual information within the camera's field of view is used as input, and high-resolution video data is generated as output. This video data provides wide-ranging visual information, including areas that need cleaning.

[0655] Step 2:

[0656] The terminal compresses the acquired video data and sends it to the server via the internet. This step improves the transmission speed by reducing the size of the video data. Raw video data is required as input, and compressed video data is the output.

[0657] Step 3:

[0658] The server decompresses the received compressed video data and inputs it into a generative artificial intelligence model. In this step, the server analyzes the video data to identify the type and location of dirt in the room. Based on the input data, the AI model analyzes the data using a deep learning algorithm and outputs the type and extent of the dirt, as well as the appropriate cleaning method.

[0659] Step 4:

[0660] Based on the analysis results, the server determines specific cleaning methods and tools and generates this information in the form of prompt statements. In this step, the server scrutinizes the information based on the identified dirt and constructs a user-friendly suggestion statement. The input is output information from the AI model, and the output is a prompt statement explaining the recommended cleaning method.

[0661] Step 5:

[0662] The server sends the generated prompt message to the terminal. The input is a prompt message describing the recommended cleaning method, and the output is the data transmitted to the terminal.

[0663] Step 6:

[0664] The device converts the received recommendation information into speech and provides guidance to the user. In this step, speech synthesis technology is used to convert text data into speech data. The input is a prompt, and the output is a voice guide for the user.

[0665] Step 7:

[0666] The user reports the progress and results of cleaning using a voice input interface. In this step, the voice input from the user is converted into text data and sent to the server. The input is the user's voice, and the output is the basis for determining the next instruction.

[0667] Step 8:

[0668] Based on the user's progress report, the server updates the next cleaning procedure, generates new instructions as needed, and sends them to the terminal. This step involves analyzing the user's progress information and generating a new prompt message. The input is progress information, and the output is a prompt message containing the next recommended procedure.

[0669] (Application Example 1)

[0670] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0671] In many modern homes, routine cleaning is a significant burden. Manual cleaning, in particular, is time-consuming and labor-intensive, and identifying the right dirt and selecting the appropriate cleaning method can be difficult. Furthermore, instantly determining the optimal cleaning method for different types of dirt is challenging. There is also a need to monitor the progress and completion of the cleaning process in real time. To address these issues, there is a demand for more efficient and effective autonomous cleaning systems.

[0672] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0673] In this invention, the server is a device equipped with imaging means, which includes means for recording images of a dirty area, means for processing the image data and using a machine learning model to identify the type and degree of dirt, means for determining a method for suggesting the optimal cleaning means and tools based on the identified dirt, means for a voice processor that converts the suggestion into voice and provides guidance to the user, means mounted on a cleaning device and equipped with a patrol function that travels to different locations in the home to detect dirt, and communication means that sends a notification to a smartphone based on the information detected by the patrol function. This enables autonomous and efficient cleaning of the home and allows for real-time cleaning guidance and status reports to the user.

[0674] "Imaging means" refers to a device or method for recording images or videos of an object.

[0675] An "device" is an electronic or mechanical device designed to perform a specific function.

[0676] "Processing data" means analyzing, transforming, or interpreting collected information to make it a useful format.

[0677] A "machine learning model" is a system based on an algorithm that learns from data and recognizes patterns.

[0678] "Cleaning means" refers to a method or technique used to effectively remove a specific type of dirt.

[0679] "Tools" refer to the tools and equipment used to perform cleaning tasks.

[0680] A "speech processor" is a device or program for generating, converting, or outputting speech data.

[0681] "Patrol function" refers to the ability of cleaning equipment to perform specific tasks while moving along a set route or area.

[0682] "Communication methods" refer to channels and protocols used to send and receive information.

[0683] A "notification" is a message or alert used to convey specific information to a user.

[0684] To implement this invention, the following system is constructed. A server receives video data transmitted from a terminal equipped with a high-precision camera and analyzes it using a generative AI model. Through analysis on the server, the type and extent of dirt in the video are identified, and the optimal cleaning method and tools are determined. This information is converted into audio data by an audio processor and communicated to the user via a smart speaker or smartphone.

[0685] The device, acting as a household robot vacuum cleaner or similar device, patrols the room while capturing video with its camera. The collected video is transmitted to a server in real time for rapid analysis by an AI model. Cleaning instructions are sent to the user's smartphone as needed via notifications, and are also provided directly to the user as voice guidance.

[0686] For example, if the device detects a juice stain under the sofa while patrolling the living room, the server can analyze the information and provide voice guidance such as, "Please wipe the stain using juice-specific detergent and a cloth." In this way, the user can efficiently clean the area by following the suggested procedure.

[0687] A concrete example of a prompt message is, "Analyze the dirt in the captured video and suggest the best cleaning method." This prompt prompts the AI model to identify the appropriate dirt and suggest a cleaning method.

[0688] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0689] Step 1:

[0690] The device patrols the room and acquires video in real time using its camera. The input is video of the actual indoor environment, and the output is that video data. The camera inside the device captures this and processes it immediately as digital data.

[0691] Step 2:

[0692] The terminal transmits the acquired video data to the server. The input is the video data processed within the terminal, and the output is the video data received by the server. The data is transferred to the server quickly and securely through the terminal's communication module.

[0693] Step 3:

[0694] The server analyzes the received video data using a generating AI model. The input is the video data received by the server, and the output is the analysis results regarding the type and degree of dirt. The AI model within the server scans the video and identifies specific dirt features based on its internal algorithm.

[0695] Step 4:

[0696] The server determines cleaning methods and tools based on the analysis results. The input is data on the type and degree of dirt, and the output is information on recommended cleaning methods and tools. The decision algorithm within the server calculates the most effective cleaning method and selects the appropriate tools.

[0697] Step 5:

[0698] The server converts the determined cleaning method into audio data and sends it as a notification to the terminal or smartphone. The input is information about the cleaning method and tools, and the output is a cleaning guide in audio format. Speech synthesis technology converts the information into human language, providing direct guidance to the user.

[0699] Step 6:

[0700] The user performs the necessary cleaning tasks according to the provided audio guide. The input is the received audio guide, and the output is the completed cleaning task. The user performs specific cleaning tasks using the tools specified by the audio guide.

[0701] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0702] This invention provides a system that incorporates an emotion engine that recognizes the user's emotions, in addition to a device equipped with image acquisition means, in order to perform cleaning work efficiently and in a user-friendly manner. Here, the emotion engine analyzes the user's emotions from voice and facial expression data and dynamically adjusts the content and presentation method of the cleaning guide.

[0703] The device first captures the dirty area with its camera and sends the video data to a server. The server uses a generative artificial intelligence model to analyze the image, identify the type and extent of the dirt, and suggest appropriate cleaning methods and tools. In addition, an emotion engine analyzes the user's voice and facial expressions in real time to determine the user's stress level and emotional state. This information is used to adjust the cleaning guide.

[0704] For example, when the device captures the living room and sends the video data to the server, the server analyzes that there is a lot of pet hair on the floor and suggests "using a vacuum cleaner with high suction power." At the same time, if the emotion engine built into the device detects that the user is feeling stressed, it adjusts the tone of the cleaning guide to be gentler and explains the steps in more detail to provide a sense of security.

[0705] Furthermore, the system includes an interface that accepts voice input from the user, forming a feedback loop by allowing them to input cleaning progress and their impressions. This loop leverages past emotional data to personalize future suggestions and guidance. Through these features, the present invention can not only streamline cleaning operations but also improve user psychological satisfaction.

[0706] The following describes the processing flow.

[0707] Step 1:

[0708] The device activates its camera and captures the designated cleaning area. This video data is temporarily stored and prepared to be sent to the server.

[0709] Step 2:

[0710] The terminal sends video data to the server. Data compression and conversion are performed before transmission, enabling efficient communication over the network.

[0711] Step 3:

[0712] The server runs a generated artificial intelligence model to analyze the received video data and identify the type and extent of dirt. Based on these results, it determines the optimal cleaning method and tools.

[0713] Step 4:

[0714] The server sends the determined cleaning method and tools to the terminal. Furthermore, it prepares to receive voice and facial expression data from the terminal in order to analyze the user's emotional data.

[0715] Step 5:

[0716] The device uses an emotion recognition engine to analyze the user's voice and facial expressions in real time. Based on the analysis results, it determines the user's emotional state and stress level.

[0717] Step 6:

[0718] The device's emotion engine adjusts the content and tone of the cleaning guide according to the user's emotional state. For example, if the user is feeling stressed, it will provide a gentler tone and more detailed explanations.

[0719] Step 7:

[0720] The terminal uses a speech synthesizer to play pre-tuned instructions to the user via voice. This allows the user to begin cleaning tasks based on the instructions.

[0721] Step 8:

[0722] Users provide feedback to the system via voice input, reporting progress and comments. This information is recorded on the device and sent to the server.

[0723] Step 9:

[0724] The server analyzes the feedback and stores it as data for future cleaning assistance. This makes it possible to provide more personalized assistance based on emotions.

[0725] (Example 2)

[0726] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0727] This invention aims not only to perform cleaning tasks efficiently but also to improve user psychological satisfaction. Conventional cleaning systems merely detect dirt and suggest cleaning methods, lacking the ability to adjust guidance that takes into account the user's emotions and stress levels. As a result, while the cleaning work itself may become more efficient, there is a problem in that the user experience does not improve.

[0728] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0729] In this invention, the server includes means for using an artificial intelligence algorithm to receive and analyze environmental information and identify abnormal conditions, means for determining a method for presenting countermeasures based on the identified abnormal conditions, and means for evaluating the user's psychological state using an emotion processing device. This makes it possible to dynamically adjust and provide a personalized cleaning guide tailored to the user.

[0730] "Environmental information" refers to data that the device acquires through its imaging equipment, indicating the surrounding environment and conditions.

[0731] "Photography equipment" refers to devices used to acquire still images and videos, such as cameras and sensors.

[0732] An "artificial intelligence algorithm" refers to a calculation method used to analyze acquired data and derive specific results or judgments.

[0733] An "abnormal condition" refers to a state or situation that deviates from the normal range, and includes dirt and disorder that require cleaning.

[0734] "Solution method" refers to a method that provides the optimal solution or countermeasure for an identified abnormal condition.

[0735] "Equipment used" refers to the tools and equipment necessary to implement the countermeasures.

[0736] An "emotion processing device" is a device or program that analyzes a user's voice and facial expression data to determine their psychological state.

[0737] "Information transmission medium" refers to a means of conveying messages, such as audio and video, to users.

[0738] "Synthesized speech output" refers to a technology that converts text information into speech and provides it to the user as an audio guide.

[0739] The "personalized processing function" is a function that adjusts the content according to the user's status and preferences, providing optimized information.

[0740] Embodiments of the present invention will be described in detail below.

[0741] The system related to this invention is designed to make cleaning work efficient and user-friendly. Specifically, it employs a system in which a terminal acquires physical environmental information, and a server is responsible for processing the data based on that information. The entire system is composed of an image acquisition means, an emotion engine, and a generative AI model as its core components.

[0742] The image acquisition device used is a high-resolution camera mounted on the terminal. This camera captures an overall image of the room and objects, including any dirt, and transmits the details as image data to the server. Wireless communication technology is used for data transmission, ensuring secure data transfer.

[0743] The server analyzes the received image data using a generative AI model. Specifically, the AI algorithm identifies the location, type, and extent of dirt from the video. Based on this analysis, the server suggests the optimal cleaning method and cleaning tools. For example, if the processed data determines that there is a lot of pet hair on the floor, it will suggest a specific solution such as "use a vacuum cleaner with high suction power."

[0744] The device is equipped with an emotion engine to capture the user's voice and facial expressions. This emotion engine aims to understand the user's emotional state and stress level, thereby adjusting the system's guidance content and presentation methods in real time. For example, if the user is feeling stressed, the guidance tone will be set to be gentler, and step-by-step explanations will be more detailed.

[0745] For example, imagine a user attempting to clean their living room. The device can capture an image of the entire room, and the server can detect the presence of pet hair. This would then recommend the use of a vacuum cleaner with high suction power. Simultaneously, the device's emotion engine would detect the user's stress level and adjust the tone of the guidance accordingly.

[0746] Examples of prompts that can be input to the generating AI model include "Analyze the cleanliness of the room and suggest cleaning methods" and "Recognize the user's emotions and adjust the guidance accordingly."

[0747] A key feature of this system is that the user's emotional state is incorporated into the feedback loop, allowing subsequent guidance to be more personalized. Thus, this invention not only improves cleaning efficiency but also enhances psychological satisfaction.

[0748] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0749] Step 1:

[0750] The terminal captures video footage of the room via a high-resolution camera. The captured video data is transmitted to the server via wireless communication. The input is video data of the room, and the output is a digital image file sent to the server. In this step, the terminal acts as a video capture device, acquiring detailed data on the area to be cleaned.

[0751] Step 2:

[0752] The server analyzes the received video data using a generating AI model. The input is video data received from the terminal, and the output is the analysis results regarding the location, type, and extent of the dirt. In this step, the server processes the data and applies AI algorithms to identify specific types of dirt. For example, it can identify pet hair on the floor or dirt on windows.

[0753] Step 3:

[0754] The server determines the appropriate cleaning method and cleaning tools to use based on the analysis results. The input is the dirt analysis results, and the output is suggested information for the user. The server performs data calculations and generates specific advice, such as "You should use a vacuum cleaner with high suction power."

[0755] Step 4:

[0756] The device inputs the user's voice and facial expressions into an emotion engine in real time to analyze their emotional state. The input is the user's voice and facial expression data, and the output is the analysis results regarding their psychological state. The device determines the user's stress level and uses this information to adjust the guidance for the next step.

[0757] Step 5:

[0758] The server dynamically adjusts the content and tone of the cleaning guide provided to the user based on the results of an analysis of their emotional state. The input is the results of the emotional state analysis and cleaning suggestions, and the output is the adjusted guide information. In this way, the server customizes the guide to match the user's psychological state.

[0759] Step 6:

[0760] Users input cleaning progress and feedback through the terminal's voice interface. Input is voice data, and output is feedback information within the system. User feedback is used to improve future suggestions and guides.

[0761] These steps enable the system to clean efficiently while also improving user satisfaction by considering their feelings.

[0762] (Application Example 2)

[0763] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0764] Conventional automatic cleaning systems have difficulty detecting dirt and suggesting appropriate cleaning methods, and furthermore, they cannot take into account the user's feelings, resulting in a lack of improvement in user satisfaction.

[0765] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0766] In this invention, the server includes means for using an artificial intelligence model that analyzes image data and identifies specific situations, means incorporating an emotion analysis engine that analyzes the user's voice and facial expressions and evaluates their mental state, and means for a synthesizer that adjusts the guide based on the evaluation and provides it in an individualized format. This makes it possible to suggest the optimal cleaning method according to the situation and to improve the user experience by taking into consideration the user's mental state.

[0767] "Image acquisition means" refers to a device or function that captures image data of a physical surface.

[0768] An "artificial intelligence model" is an algorithm or data-driven computer program used to analyze image data and identify specific situations or depths.

[0769] An "emotion analysis engine" is a function or software that analyzes the user's voice and facial expressions to evaluate their mental state.

[0770] A "speech synthesizer" is a device or function that converts text data into speech to provide guidance and information to the user.

[0771] The "profile adjustment function" is an adjustment function that personalizes suggested methods and content based on the user's emotional data and provides them in a suitable format.

[0772] A "user interface" is a means of communication within a system that receives voice input from the user and accepts information about the progress of a task.

[0773] In this embodiment, the system mainly consists of a server, a terminal, and a user.

[0774] The server receives image data of the physical surface transmitted through the image acquisition means. It analyzes the received data and uses a generative AI model to identify specific situations and depths. This makes it possible to suggest the optimal work method and equipment for each situation. The server also uses an emotion analysis engine based on the user's voice and facial expression data to evaluate the user's mental state. Based on this evaluation, the guide is adjusted through a speech synthesizer.

[0775] The device receives voice input via the user interface and receives information on the progress of the task. It collects user feedback and uses a profile adjustment function to personalize suggestions based on the user's sentiment data.

[0776] As a concrete example, when a user cleans their living room, the device sends image data of the floor to the server. In this case, the server uses a generated AI model to analyze the distribution of pet hair and suggests a "high-suction vacuum cleaner" as the appropriate cleaning method. The server also uses an emotion analysis engine to sense the user's stress level and uses a speech synthesizer to provide a guide message in a warm tone, such as "Let's make today a fun day."

[0777] An example of a prompt for a generative AI model is: based on "image data: living room floor" and "user voice: 'Which vacuum cleaner is best?'", it suggests an appropriate cleaning method.

[0778] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0779] Step 1:

[0780] The device captures image data of a physical surface using a camera. The input is raw image data acquired by the camera sensor. This is converted into digital data and sent to the server.

[0781] Step 2:

[0782] The server analyzes the received image data and uses a generative AI model to identify specific stains or conditions. The input is image data sent from the terminal. The generative AI model identifies the type and depth of the stain from this data and extracts the information necessary to suggest cleaning methods. The output is the type of stain as identified and a suggested cleaning method.

[0783] Step 3:

[0784] The server inputs the user's voice and facial expression data into an emotion analysis engine to evaluate the user's mental state. The input consists of the user's voice and facial expression data obtained from the terminal. The emotion analysis engine analyzes this data and outputs numerical values for emotional states such as stress levels. The output is numerical or categorical data related to the user's mental state.

[0785] Step 4:

[0786] The server uses a speech synthesizer to create personalized guides based on suggested cleaning methods and user mental state data, and outputs them as audio data. The inputs are the dirt analysis results and emotion analysis results from a generative AI model. The speech synthesizer generates an audio guide with a tone and content that takes these inputs into account, and sends it to the terminal. The output is the audio guide data.

[0787] Step 5:

[0788] The terminal plays audio guide data transmitted from the server to the user and receives feedback from the user through the user interface. The input is audio data from the server, and the output is text or audio data representing the user's responses and feedback.

[0789] Step 6:

[0790] Users provide feedback via voice or text on their device. This feedback is sent back to the server, where the profile adjustment function updates the behavioral history to help personalize future suggestions. The input is user feedback data, and the output is the updated user profile data.

[0791] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0792] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0793] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0794] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0795] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0796] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0797] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0798] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0799] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0800] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0801] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0802] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and distributed processing for the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0803] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0804] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0805] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0806] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0807] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0808] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0809] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0810] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0811] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0812] The following is further disclosed regarding the embodiments described above.

[0813] (Claim 1)

[0814] A device equipped with an image acquisition means, which includes a means for capturing video of a dirty area,

[0815] Means for analyzing the aforementioned video data and using a generative artificial intelligence model to identify the type and degree of dirt,

[0816] A means for determining a method for proposing appropriate cleaning methods and tools based on the identified dirt,

[0817] The means includes a speech synthesizer that converts the aforementioned proposal into speech and provides a guide to the user,

[0818] A system that includes this.

[0819] (Claim 2)

[0820] The system according to claim 1, further comprising an interface for receiving voice input from a user and receiving confirmation that the cleaning status is complete.

[0821] (Claim 3)

[0822] The system according to claim 1, characterized by having a profile adjustment function that provides the proposed cleaning method and tools in a format that suits the user's preferences.

[0823] "Example 1"

[0824] (Claim 1)

[0825] The device is equipped with an image capture device and has means for acquiring images of the contaminated area,

[0826] Means for analyzing the aforementioned video information and using a generative artificial intelligence model to identify the type and degree of dirt,

[0827] A means for determining a method for proposing appropriate cleaning means and apparatus based on the identified dirt,

[0828] A means comprising a voice converter that converts the aforementioned proposal into speech and provides guidance to the user,

[0829] A means equipped with a voice input interface for receiving progress reports from users,

[0830] A system that includes this.

[0831] (Claim 2)

[0832] The system according to claim 1, characterized in that it updates the next instructions based on the user's progress information and efficiently supports cleaning activities.

[0833] (Claim 3)

[0834] The system according to claim 1, characterized in that it has a function to provide guidance content in a format that meets the user's needs, based on the proposed cleaning means and apparatus.

[0835] "Application Example 1"

[0836] (Claim 1)

[0837] It is a device equipped with an imaging means, and a means for recording images of a dirty area,

[0838] A means for processing the aforementioned video data and using a machine learning model to identify the type and degree of dirt,

[0839] A means for determining a method for suggesting the optimal cleaning means and tools based on the identified dirt,

[0840] A means comprising a voice processor that converts the aforementioned presentation into audio and provides guidance to the user,

[0841] A means equipped in cleaning equipment that has a patrol function that detects dirt by going around to different locations in the home,

[0842] A communication means that sends a notification to a smartphone based on the information detected by the aforementioned patrol function,

[0843] A system that includes this.

[0844] (Claim 2)

[0845] The system according to claim 1, further comprising a human-machine dialogue interface for receiving voice input and reporting cleaning progress and completion status.

[0846] (Claim 3)

[0847] The system according to claim 1, comprising a setting adjustment function that provides the presented content in a format that suits the user's preferences, based on the presented cleaning means and tools.

[0848] "Example 2 of combining an emotion engine"

[0849] (Claim 1)

[0850] It is a device equipped with a camera that acquires environmental information,

[0851] A means for receiving and analyzing the aforementioned environmental information and using an artificial intelligence algorithm to identify abnormal conditions in the environment,

[0852] A means for determining a method for suggesting appropriate countermeasures and equipment based on the identified abnormal condition,

[0853] A means for using an emotion processing device to analyze the user's voice and facial expression information and evaluate their psychological state,

[0854] Based on the aforementioned evaluation results, means equipped with an adjustment function for dynamically adjusting the content and method of presentation,

[0855] A means for producing synthesized speech output to provide the content to the user via an information transmission medium,

[0856] A system that includes this.

[0857] (Claim 2)

[0858] The system according to claim 1, further comprising a dialogue function for receiving voice data from a user and collecting information on the progress of processing.

[0859] (Claim 3)

[0860] The system according to claim 1, characterized by having an individualized processing function for adjusting the format of the proposal according to the user's condition, based on the presented countermeasures and equipment used.

[0861] "Application example 2 of combining emotional engines"

[0862] (Claim 1)

[0863] An apparatus equipped with an image acquisition means, which includes means for capturing image data of a physical surface,

[0864] A means for analyzing the aforementioned image data and using an artificial intelligence model to identify a specific situation and its depth,

[0865] A means for determining a method for proposing the optimal work method and equipment based on the identified circumstances,

[0866] It incorporates an emotion analysis engine, which analyzes the user's voice and facial expressions to evaluate their mental state.

[0867] A means comprising a synthesizer that adjusts the guide based on the aforementioned evaluation and presents it in audio,

[0868] A system that includes this.

[0869] (Claim 2)

[0870] The system according to claim 1, further comprising a user interface for receiving voice input from the user and receiving information on the progress of the work.

[0871] (Claim 3)

[0872] The system according to claim 1, characterized in that it includes a profile adjustment function that adjusts the content of the proposed method and device based on the user's emotional data and provides it in an individualized format. [Explanation of Symbols]

[0873] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. It is a device equipped with an imaging means, and a means for recording images of a dirty area, A means for processing the aforementioned video data and using a machine learning model to identify the type and degree of dirt, A means for determining a method for suggesting the optimal cleaning means and tools based on the identified dirt, A means comprising a voice processor that converts the aforementioned presentation into audio and provides guidance to the user, A means equipped in cleaning equipment that has a patrol function that detects dirt by going around to different locations in the home, A communication means that sends a notification to a smartphone based on the information detected by the aforementioned patrol function, A system that includes this.

2. The system according to claim 1, further comprising a human-machine dialogue interface for receiving voice input and reporting cleaning progress and completion status.

3. The system according to claim 1, comprising a setting adjustment function that provides the presented content in a format that suits the user's preferences, based on the presented cleaning means and tools.