system
The system uses a 360-degree camera and AI-driven processing to analyze surroundings and provide voice or vibration warnings, addressing the need for real-time safety enhancements for individuals with reduced attention, enhancing safety and convenience.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Existing systems fail to provide real-time monitoring and appropriate warnings for individuals with reduced attention or reaction speed, such as the elderly and disabled, to prevent accidents and enhance safety in their surroundings.
A system comprising a 360-degree camera, a processing unit that analyzes video information using AI for situational awareness, and a warning device that provides notifications via voice or vibration, along with the ability to record and analyze user behavior history to predict potential dangers.
Enables users to recognize potential dangers in real-time, take appropriate actions, and receive personalized warnings, thereby improving safety and convenience in daily life.
Smart Images

Figure 2026101359000001_ABST
Abstract
Description
Technical Field
[0004] , , ,
[0005] , , , ,
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In people's daily lives, it is required to prevent accidents and dangers caused by carelessness or blind spots in the field of vision, and to improve safety. In particular, for people with reduced attention or reaction speed, such as the elderly and disabled, means for efficiently grasping their own and surrounding situations are necessary. However, current technologies have not effectively provided a system that can grasp the surrounding situation in real time and issue appropriate warnings, and solving this is an issue of the present invention.
Means for Solving the Problems
[0005] The present invention provides a system comprising a camera for collecting information about the user's surroundings, a processing device for receiving and analyzing video information transmitted from the camera, and a warning device for providing warning information to the user based on the information analyzed by the processing device. This system allows the user to understand their surroundings in real time. Furthermore, by recording and analyzing the user's behavior history, the processing device can detect potential dangers in advance, and the warning device can notify the user by voice or vibration. This provides a means for users with reduced attention spans to ensure their own safety.
[0006] "Recording equipment" refers to a device for collecting surrounding video information, and includes a camera capable of covering the user's environment in 360 degrees.
[0007] A "processing unit" is a computer system that receives and analyzes video information transmitted from a camera, and is particularly a unit that uses AI to make situational judgments.
[0008] A "warning device" is a device that provides warnings to the user based on information analyzed by a processing unit, and includes an interface that provides notifications through methods such as voice and vibration.
[0009] "User" refers to a person who uses this system, and is particularly a person who needs to ensure their safety in their daily life.
[0010] "Visual information" refers to image data acquired by a recording device, and includes visual information of the user and their surroundings.
[0011] "Behavioral history" refers to data that records a user's past actions and movement patterns, and is used to predict future problems. [Brief explanation of the drawing]
[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]
[0013] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.
[0014] First, the terms used in the following description will be explained.
[0015] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0016] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0017] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0018] In the following embodiments, the numbered communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.
[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0020] [First Embodiment]
[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0033] This invention is a system that combines a 360-degree camera, a processing unit that processes received video information, and a warning device that provides warnings according to the situation, in order to improve the safety of users in their daily lives. By using this system, users can always be aware of their surroundings and proactively avoid danger.
[0034] The server receives video information transmitted from the recording equipment and analyzes its contents using an AI algorithm. This AI incorporates object detection, motion analysis, and pattern recognition capabilities, enabling it to identify moving objects such as people and cars and compare them with past behavioral patterns. If the analysis identifies a danger, the server immediately transmits the information to the warning device.
[0035] The device notifies the user of alert information received from the server via voice or vibration. This notification helps the user quickly recognize their surroundings and take appropriate action. For example, when the device detects an approaching vehicle, it can safely attract the user's attention by issuing a loud warning alert.
[0036] Users check alerts from the system and respond to identified risks as quickly as possible. Furthermore, the system can accumulate user activity history and identify recurring mistakes from past patterns, enabling it to provide preventative warnings. For example, if a habit of forgetting to lock the door is detected, the device will issue a reminder before the user leaves the house. In this way, the system can improve convenience while making the user's life safer.
[0037] As a concrete example, consider a scenario where a user is walking through a busy shopping district and a stranger suddenly approaches them in the middle of a crowd. At this moment, the camera captures the person, and the server detects the abnormality of the movement. The warning device instantly sends a notification to the user, and the user ensures their safety by immediately choosing to leave the area. In this form, the present invention can be effectively implemented.
[0038] The following describes the processing flow.
[0039] Step 1:
[0040] The terminal collects 360-degree video information in real time from the camera worn by the user and transmits it to the server.
[0041] Step 2:
[0042] The server receives video information transmitted from the terminal and begins analysis using an AI algorithm. This analysis includes object recognition and motion analysis to detect whether there are any potential dangers in the surrounding environment.
[0043] Step 3:
[0044] Based on the analysis results, the server immediately generates an alert if an anomaly or danger is detected. The alert priority is then evaluated and classified according to its severity.
[0045] Step 4:
[0046] The server sends the generated warning to the terminal. The warning includes audio and vibration notifications and contains specific instructions to draw the user's attention.
[0047] Step 5:
[0048] The device notifies the user of warnings received from the server. It uses voice messages and vibrations to attract the user's attention in real time and encourage safe actions.
[0049] Step 6:
[0050] The user receives a warning from the device and takes appropriate action while checking their surroundings. For example, they might take a step back to create distance from an approaching vehicle.
[0051] Step 7:
[0052] The server records the results of warnings and user responses in a database and updates the behavioral history for later analysis. This data is used to predict future events and improve the accuracy of warnings.
[0053] (Example 1)
[0054] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0055] In daily life, it is crucial for users to quickly recognize and safely address potential dangers in their surroundings. However, conventional methods have faced challenges in continuously monitoring the surrounding environment and providing insufficient real-time warnings. In particular, efficiently detecting moving objects and analyzing past behavioral history has been difficult, making it impossible to proactively detect potential dangers.
[0056] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0057] In this invention, the server includes an electronic device for collecting information about the user's surroundings, an information processing means for receiving video information transmitted from the electronic device and analyzing it using a generative AI model, a warning means for providing a warning signal to the user based on the results of the analysis by the information processing means, and a storage means for recording and analyzing past behavioral history. This enables the user to recognize potential dangers in their surroundings in real time and respond quickly.
[0058] An "electronic device" is a device that collects 360-degree information about the user's surroundings and generates video information.
[0059] "Information processing means" refers to a server-side component that analyzes received video information and uses an AI model to perform object detection, motion analysis, and pattern recognition.
[0060] A "warning device" is a device that provides a warning signal to the user via voice or vibration based on the results of analysis by an information processing device.
[0061] A "generative AI model" is an algorithm used to analyze video information and has the function of recognizing objects and detecting abnormalities in their movements.
[0062] A "memory device" is a component used to predict potential risks by recording and analyzing a user's past behavioral history.
[0063] This invention is a comprehensive monitoring system designed to improve the safety of users in their daily lives. The system consists of an electronic device that captures 360-degree images of the user's surroundings, a server for analyzing the information, and a terminal that alerts the user.
[0064] The server receives video information transmitted from electronic devices and analyzes its content using a generative AI model. Specifically, the information processing system on the server performs object detection, motion analysis, and pattern recognition to identify moving objects such as people and cars, as well as abnormal behavior. The generative AI model uses advanced algorithms and can analyze the situation in real time while comparing it with existing data. It can also record past behavioral history using memory devices and analyze patterns of frequently occurring errors.
[0065] The device receives warning signals from the server and notifies the user via sound and vibration. This allows the user to immediately recognize danger and take appropriate action. For example, if it detects a vehicle approaching rapidly, the device will emit a loud warning to alert the user to the danger. In addition, based on recorded activity history, the device will send a reminder notification to users who frequently forget to lock their doors when going out.
[0066] A concrete example would be a scenario where a user is walking through a busy area and a stranger suddenly approaches them. In this case, the electronic device detects the person, and the server detects the abnormal behavior. An immediate warning is then sent to the user through the device, allowing the user to avoid danger by choosing to safely leave the area.
[0067] An example of a prompt statement is to use the following sentence as input to the generating AI model: "Explain how the system detects and warns of dangerous situations approaching the user while they are walking."
[0068] This configuration not only enhances user safety but also improves convenience.
[0069] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0070] Step 1:
[0071] The server receives high-resolution video data transmitted from electronic devices as input. This video data captures the user's surroundings in 360 degrees in real time. Specifically, the electronic devices continuously capture video at a constant frame rate and stream it to the server.
[0072] Step 2:
[0073] The server analyzes the received video data using a generating AI model. This analysis applies an object detection algorithm to identify moving objects and unusual movements from the input data. Specifically, the AI model processes the frames extracted from the video to detect abnormal movements and specific patterns, and generates risk level information as output.
[0074] Step 3:
[0075] The server generates a warning signal based on the generated risk information. If the information processing determines that there is a risk, the information is sent to the warning device. Specifically, when an anomaly is detected, a signal is quickly generated and sent to the terminal.
[0076] Step 4:
[0077] The device receives a warning signal sent from the server and transmits it to the user. Based on the warning signal as input, an audio or vibration notification is generated as output. Specifically, the device alerts the user using the configured notification method to attract the user's attention.
[0078] Step 5:
[0079] The user receives a warning notification from their device, checks their surroundings based on the entered warning information, and takes action to ensure their safety. Specifically, the user who sees the alert immediately checks their surroundings and takes action such as leaving the area if necessary.
[0080] (Application Example 1)
[0081] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0082] In modern homes, there is a need to quickly detect intruders or abnormal activity from outside and ensure the safety of the family. However, conventional security systems consist only of fixed cameras, which can only monitor a limited field of vision, and often rely on reactive measures after an incident has occurred. Therefore, there is a need for a system that can move around inside and outside the house, grasp the situation in real time, and issue a rapid warning.
[0083] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0084] In this invention, the server includes an image acquisition means for monitoring and recording the surrounding environment, an information processing means for receiving and analyzing video data transmitted from the image acquisition means, a warning output means for providing warning information to the user based on the data analyzed by the information processing means, and a means for detecting abnormalities inside and outside the home in real time and notifying the user using voice and data communication means. This makes it possible to monitor the safety inside and outside the home in real time, immediately issue a warning to the user if suspicious activity is detected, and enable a rapid response.
[0085] An "image acquisition means" is a device that has the function of visually recording the surrounding situation and generating video data.
[0086] "Information processing means" refers to a device that analyzes video data received from image acquisition means and performs computational processing to detect anomalies or changes.
[0087] A "warning output means" is a device that transmits abnormalities detected by information processing means to the user and issues a warning via voice or data communication.
[0088] A "data communication device" is a communication device used to send real-time notifications to users in remote locations via a network.
[0089] "Real-time" refers to a time frame that enables immediate responses and actions by transmitting and receiving information with virtually no delay.
[0090] The system that realizes this invention is configured by combining image acquisition means, information processing means, warning output means, and data communication means to ensure safety both inside and outside the home.
[0091] The server uses a 360-degree camera to acquire images and monitor the situation inside and outside the home in real time. The captured video data is analyzed using AI algorithms with high-performance computer information processing capabilities. Specifically, data calculations such as object detection and recognition of motion anomalies are performed using NVIDIA Jetson or similar GPUs.
[0092] Once information processing is complete, the system uses a voice assistant function as a warning output method to immediately notify the user. For example, it may issue voice warnings via Amazon Alexa or Google Assistant, or deliver warning information via push notifications to smartphones.
[0093] Users can receive information in real time through data communication and take safe actions as needed. This network-based communication enables immediate information transmission even to users outside the home.
[0094] A concrete example would be a situation where a suspicious person enters the garden while a family is resting in the living room. In this case, the system captures the person's movements using image acquisition means, detects the anomaly using information processing means, and then sends an alert via voice and to a smartphone through a warning output means. This allows the family to take immediate and safe action.
[0095] As an example of a prompt message to pass to a generative AI model, you could use text such as, "Analyze camera footage, detect if there are any moving objects in the garden, and use the AI to send a warning to the residents if it determines that there is a risk."
[0096] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0097] Step 1:
[0098] The server uses a 360-degree camera to capture video both inside and outside the home. It receives ambient image data as input and processes it as streaming data. The camera's operation generates a continuous video flow.
[0099] Step 2:
[0100] The server passes the acquired video to the information processing system. It receives a video stream as input and uses an AI algorithm to perform object recognition. Here, NVIDIA Jetson is used for data processing to detect moving objects from the video and identify abnormal behavior. If an anomaly is detected, that information is output as identification data.
[0101] Step 3:
[0102] The server assesses the risk level based on the identification data and activates the warning output mechanism. It receives identification data as input and generates data to notify the user using voice assistants and data communication functions. The generated notification data is output and sent to the user's device.
[0103] Step 4:
[0104] The device receives notification data sent from the server and transmits that information to the user. It receives notification data as input and outputs a warning to the user via sound or vibration. It attracts the user's attention by outputting a prompt message such as "Dangerous activity detected. Please check your surroundings."
[0105] Step 5:
[0106] The user reviews the warning and makes decisions to take safe actions. The system receives warning information from the terminal as input and, if necessary, checks the site or takes other safety measures. By repeating these steps, it becomes possible to ensure the user's safety in real time.
[0107] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0108] This invention is a system that combines a camera, a processing unit, a warning device, and an emotion engine to understand the user's surroundings in real time and improve safety in conjunction with their emotional state. This system allows the user to comprehensively manage their own safety and receive appropriate warnings for their situation.
[0109] The server receives video information from the recording equipment and analyzes its content using AI. The analysis process includes object recognition, motion analysis, and emotion recognition by an emotion engine. The emotion engine analyzes data such as the user's facial expressions and voice to evaluate the user's emotional state. For example, if the user shows signs of stress, the server can use that information to adjust the priority and content of warnings.
[0110] The terminal receives analysis results from the server and provides the user with appropriate warnings. The warning device uses voice and vibration to notify the user in a way that suits their emotional state. This may include instructions that encourage calmness so that the user can respond to the situation calmly.
[0111] Users receive notifications from their device and can respond quickly to their psychological state and surrounding environment. For example, if a user is feeling stressed on crowded public transport, the emotion engine detects this state. The device then provides a gentle voice alert to encourage relaxation, making it easier for the user to reduce stress. In this form, the present invention can not only improve safety but also support the user's mental health.
[0112] Furthermore, the server records user behavior history and sentiment data, and performs periodic analysis. This data is used to predict future risks and optimize warnings, enabling the delivery of increasingly personalized services to users.
[0113] The following describes the processing flow.
[0114] Step 1:
[0115] The device collects 360-degree video information in real time from the user's camera and transmits it to the server. This video information also includes the user's facial expressions and actions.
[0116] Step 2:
[0117] The server receives video information transmitted from the terminal. The received information is analyzed using an AI algorithm. During the analysis process, the server performs object recognition and understands the surrounding environment. It also uses an emotion engine to evaluate the user's emotional state from their facial expressions and voice.
[0118] Step 3:
[0119] The server integrates the results of risk assessment using object recognition and emotion analysis using an emotion engine. Based on these results, it generates appropriate warnings according to the urgency and emotional state. In particular, if the emotional state indicates stress or anxiety, it adjusts the warning content and notification method.
[0120] Step 4:
[0121] The server sends a warning it has generated to the terminal. The terminal then prepares to notify the user of this warning.
[0122] Step 5:
[0123] The device notifies the user of warnings using sound or vibration. Depending on the user's emotional state, the tone and content of the notification may be customized and may include messages to promote relaxation.
[0124] Step 6:
[0125] The user receives a warning from the device, promptly checks their surroundings, and then takes a safe response in a manner appropriate to their emotional state.
[0126] Step 7:
[0127] The server records user behavior history and sentiment data in a database. This data will be used to improve the accuracy of future warnings and to provide optimized notifications for individual users.
[0128] (Example 2)
[0129] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0130] In today's social environment, stressful situations occur frequently, potentially deteriorating users' safety and mental state. However, existing safety management systems fail to provide warnings and support that take into account users' emotional states, making them inadequate. This invention aims to solve this problem by constructing a system that grasps users' emotional states in real time and provides personalized warnings and support.
[0131] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0132] In this invention, the server includes means for receiving video information transmitted from a camera for collecting information about the user's surroundings and recognizing objects and actions contained in the information; means for performing analysis necessary to evaluate the user's emotional state; and means for notifying the user of optimized warning information by voice or vibration based on their emotional state and behavioral history. This enables the user to receive personalized advice and warnings in real time that are tailored to their emotional state, allowing them to live a safer and healthier life.
[0133] A "recording device" is a device used to collect information about the user's surroundings and is responsible for acquiring video information.
[0134] A "processing device" is a device that analyzes video information sent from a camera and has the computational functions to recognize objects and actions.
[0135] An "emotion engine" is a system that analyzes a user's facial expressions and voice to evaluate their emotional state.
[0136] A "warning device" is a device that provides warning information to the user based on information analyzed by the processing unit.
[0137] "Voice or vibration means" refers to means used by a warning device to notify the user of warning information, and includes voice guidance and vibrational feedback.
[0138] "Behavioral history" refers to records of a user's past actions, and this data is used through analysis to predict future risks and optimize warnings.
[0139] "Emotional data" refers to quantified information about a user's emotional state, which the system uses to provide users with appropriate warnings and advice in real time.
[0140] This invention is a system for understanding the user's surroundings and providing warnings and support tailored to the user's emotional state. It is implemented through a combination of imaging equipment, a processing unit, a warning device, and an emotion engine.
[0141] First, the server receives video information in real time from the recording equipment. This recording equipment includes smartphones and other image acquisition devices. The server then analyzes this video information using AI technology. Specifically, it uses models like YOLO for object recognition and models like OpenPose for motion analysis.
[0142] Next, the server uses an emotion engine to analyze the user's facial expressions and voice data to evaluate their emotional state. The emotion engine uses voice analysis software, such as Google Cloud Speech-to-Text, to determine the user's stress level and other factors.
[0143] The analyzed information is sent from the server to the terminal. The terminal then uses this information to provide the user with the most appropriate warning. The warning device can utilize voice alerts or vibration feedback. For example, it can play a voice guide to promote relaxation.
[0144] Users can receive notifications from their devices and respond quickly to their psychological state and surrounding environment. For example, in crowded public transport, the emotion engine can detect the user's stress level, and the device can send a gentle voice alert to help the user relax. This system functions as an interactive platform to support the user's mental health.
[0145] An example of a prompt might be: "If you encounter a stressed-out user in a crowded train station, how would the emotion engine respond? What advice could it offer to help them relax?"
[0146] In this way, the present invention provides a system that can comprehensively manage both user safety and mental health.
[0147] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0148] Step 1:
[0149] The server receives video information in real time from the camera. The input is video data capturing the user's surroundings. The server begins analyzing this data using AI technology. A model like YOLO is used for object recognition to identify objects in the video. The output is information about the objects recognized in the video. Specifically, the process involves sending video captured by the camera to the server, breaking down the video frame by frame, and inputting it into the object recognition model.
[0150] Step 2:
[0151] The server performs motion analysis based on the received video information. The input is video data containing object information recognized in step 1. The server uses a motion analysis model such as OpenPose to analyze the user's behavior patterns. The output is information about the current user's behavior patterns. Specific actions include extracting skeletal information of the recognized person and performing calculations to determine the type and state of the action.
[0152] Step 3:
[0153] The server uses an emotion engine to evaluate the user's emotional state. The inputs are the behavioral information obtained in step 2 and the user's voice and facial expression data. The emotion engine analyzes the voice data using software such as Amazon Polly or Google Cloud Speech-to-Text to quantify the user's stress and emotional level. The output is the quantified evaluation result of the emotional state. The specific operation involves sensing changes in voice tone and facial expression and performing an emotional evaluation accordingly.
[0154] Step 4:
[0155] The device receives analysis results from the server and generates a warning. The input is the user's emotional state and behavioral information sent from the server. Based on this information, the device constructs the most appropriate warning for the user. The output is the content of the audio alert and vibration feedback provided to the user. The specific operation involves generating a voice message that promotes relaxation according to the user's state and playing it through the device.
[0156] Step 5:
[0157] The user responds based on the warning information received from the device. Input is a notification from the device via sound or vibration. The user adjusts their mental state based on the notification content. Output is a state in which the user is relaxed and adapted to the stressful environment. Specific actions include following instructions such as "take a deep breath," as an example of the advice given.
[0158] (Application Example 2)
[0159] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0160] In modern living environments, people often experience a great deal of stress and anxiety. Furthermore, while safety must be ensured even at home, current technology struggles to provide appropriate support that takes into account the user's emotional state. A system is needed to improve this situation and enhance the psychological and physical safety of users.
[0161] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0162] In this invention, the server includes a camera component for collecting information about the user's surroundings, processing means for receiving and analyzing video information transmitted from the camera component, notification means for providing warning information to the user based on the information analyzed by the processing means, evaluation means for identifying the user's emotional state and supporting their psychological health, and control means for operating the in-home device according to the emotional state detected by the evaluation means. This enables personalized safety improvements and mental support based on the user's emotional state.
[0163] A "camera component" is a device used to monitor the environment around the user and acquire video information from it.
[0164] A "processing device" is a device that analyzes video information obtained from the captured component to perform object recognition and motion analysis.
[0165] A "notification means" is a device that provides warnings or guidance to the user based on the analysis results of the processing means.
[0166] An "evaluation method" is a system for identifying a user's emotional state from their facial expressions and voice.
[0167] A "control means" is a mechanism that operates devices within the home in accordance with the emotional state obtained by the evaluation means.
[0168] The server acquires video information using a camera component to capture the user's surroundings. This information is analyzed by processing units and used to evaluate the user's current state. In particular, in addition to object recognition and motion analysis, evaluation units are used to identify emotions from the user's facial expressions and voice. This process employs advanced AI algorithms.
[0169] The terminal sends notifications to the user based on analysis results received from the server. The notification system provides appropriate warning information via voice and vibration according to the user's emotional state, and further supports the user's psychological well-being by allowing them to operate home devices through the control system. For example, if the system determines that the user is experiencing high levels of stress, it automatically creates a relaxing environment.
[0170] For example, if the system detects that a user is feeling anxious upon returning home from work, it could provide support by changing the lighting to a warmer color and playing calming music. Furthermore, when using the generative AI model, a prompt such as, "Please suggest appropriate relaxation methods for a user who is feeling stressed upon returning home," could be used.
[0171] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0172] Step 1:
[0173] The server uses the camera component to acquire video information about the user's surroundings. The input for this step is a real-time video feed from the camera, and the output is saving it as digital data. Specifically, it connects to the camera device and captures image data at regular intervals.
[0174] Step 2:
[0175] The server analyzes the acquired video information using processing equipment. This analysis performs object recognition and motion analysis. The input is the digital video data acquired in the previous step, and the output is the data obtained through the analysis, i.e., information about the surrounding environment. Deep learning algorithms are used for processing, and an AI model identifies objects and actions.
[0176] Step 3:
[0177] The server identifies the user's emotional state using evaluation tools. The input for this step is data related to the user's facial expressions and voice, and the output is specific labels or scores indicating the user's emotional state. Specifically, machine learning is used to analyze changes in facial expressions and vocal intonation.
[0178] Step 4:
[0179] The device sends notifications based on the emotional state transmitted from the server. The input for this step is the analysis results regarding the emotional state, and the output is warning or guidance information provided to the user. Specifically, this involves reading the message aloud to the user or using the smartphone's vibration motor to send notifications.
[0180] Step 5:
[0181] The terminal operates home devices in response to the user's emotional state through control means. The input in this step is a control command based on the user's emotional state, and the output is the modified state of the home device. This includes specific actions such as changing the color of the lights or playing music through a smart speaker.
[0182] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0183] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0184] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0185] [Second Embodiment]
[0186] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0187] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0188] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0189] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0190] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0191] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0192] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0193] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0194] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0195] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0196] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0197] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0198] This invention is a system that combines a 360-degree camera, a processing unit that processes received video information, and a warning device that provides warnings according to the situation, in order to improve the safety of users in their daily lives. By using this system, users can always be aware of their surroundings and proactively avoid danger.
[0199] The server receives video information transmitted from the recording equipment and analyzes its contents using an AI algorithm. This AI incorporates object detection, motion analysis, and pattern recognition capabilities, enabling it to identify moving objects such as people and cars and compare them with past behavioral patterns. If the analysis identifies a danger, the server immediately transmits the information to the warning device.
[0200] The device notifies the user of alert information received from the server via voice or vibration. This notification helps the user quickly recognize their surroundings and take appropriate action. For example, when the device detects an approaching vehicle, it can safely attract the user's attention by issuing a loud warning alert.
[0201] Users check alerts from the system and respond to identified risks as quickly as possible. Furthermore, the system can accumulate user activity history and identify recurring mistakes from past patterns, enabling it to provide preventative warnings. For example, if a habit of forgetting to lock the door is detected, the device will issue a reminder before the user leaves the house. In this way, the system can improve convenience while making the user's life safer.
[0202] As a concrete example, consider a scenario where a user is walking through a busy shopping district and a stranger suddenly approaches them in the middle of a crowd. At this moment, the camera captures the person, and the server detects the abnormality of the movement. The warning device instantly sends a notification to the user, and the user ensures their safety by immediately choosing to leave the area. In this form, the present invention can be effectively implemented.
[0203] The following describes the processing flow.
[0204] Step 1:
[0205] The terminal collects 360-degree video information in real time from the camera worn by the user and transmits it to the server.
[0206] Step 2:
[0207] The server receives video information transmitted from the terminal and begins analysis using an AI algorithm. This analysis includes object recognition and motion analysis to detect whether there are any potential dangers in the surrounding environment.
[0208] Step 3:
[0209] Based on the analysis results, the server immediately generates an alert if an anomaly or danger is detected. The alert priority is then evaluated and classified according to its severity.
[0210] Step 4:
[0211] The server sends the generated warning to the terminal. The warning includes audio and vibration notifications and contains specific instructions to draw the user's attention.
[0212] Step 5:
[0213] The device notifies the user of warnings received from the server. It uses voice messages and vibrations to attract the user's attention in real time and encourage safe actions.
[0214] Step 6:
[0215] The user receives a warning from the device and takes appropriate action while checking their surroundings. For example, they might take a step back to create distance from an approaching vehicle.
[0216] Step 7:
[0217] The server records the results of warnings and user responses in a database and updates the behavioral history for later analysis. This data is used to predict future events and improve the accuracy of warnings.
[0218] (Example 1)
[0219] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0220] In daily life, it is crucial for users to quickly recognize and safely address potential dangers in their surroundings. However, conventional methods have faced challenges in continuously monitoring the surrounding environment and providing insufficient real-time warnings. In particular, efficiently detecting moving objects and analyzing past behavioral history has been difficult, making it impossible to proactively detect potential dangers.
[0221] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0222] In this invention, the server includes an electronic device for collecting information about the user's surroundings, an information processing means for receiving video information transmitted from the electronic device and analyzing it using a generative AI model, a warning means for providing a warning signal to the user based on the results of the analysis by the information processing means, and a storage means for recording and analyzing past behavioral history. This enables the user to recognize potential dangers in their surroundings in real time and respond quickly.
[0223] An "electronic device" is a device that collects 360-degree information about the user's surroundings and generates video information.
[0224] "Information processing means" refers to a server-side component that analyzes received video information and uses an AI model to perform object detection, motion analysis, and pattern recognition.
[0225] A "warning device" is a device that provides a warning signal to the user via voice or vibration based on the results of analysis by an information processing device.
[0226] A "generative AI model" is an algorithm used to analyze video information and has the function of recognizing objects and detecting abnormalities in their movements.
[0227] A "memory device" is a component used to predict potential risks by recording and analyzing a user's past behavioral history.
[0228] This invention is a comprehensive monitoring system designed to improve the safety of users in their daily lives. The system consists of an electronic device that captures 360-degree images of the user's surroundings, a server for analyzing the information, and a terminal that alerts the user.
[0229] The server receives video information transmitted from electronic devices and analyzes its content using a generative AI model. Specifically, the information processing system on the server performs object detection, motion analysis, and pattern recognition to identify moving objects such as people and cars, as well as abnormal behavior. The generative AI model uses advanced algorithms and can analyze the situation in real time while comparing it with existing data. It can also record past behavioral history using memory devices and analyze patterns of frequently occurring errors.
[0230] The device receives warning signals from the server and notifies the user via sound and vibration. This allows the user to immediately recognize danger and take appropriate action. For example, if it detects a vehicle approaching rapidly, the device will emit a loud warning to alert the user to the danger. In addition, based on recorded activity history, the device will send a reminder notification to users who frequently forget to lock their doors when going out.
[0231] A concrete example would be a scenario where a user is walking through a busy area and a stranger suddenly approaches them. In this case, the electronic device detects the person, and the server detects the abnormal behavior. An immediate warning is then sent to the user through the device, allowing the user to avoid danger by choosing to safely leave the area.
[0232] An example of a prompt statement is to use the following sentence as input to the generating AI model: "Explain how the system detects and warns of dangerous situations approaching the user while they are walking."
[0233] This configuration not only enhances user safety but also improves convenience.
[0234] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0235] Step 1:
[0236] The server receives high-resolution video data transmitted from electronic devices as input. This video data captures the user's surroundings in 360 degrees in real time. Specifically, the electronic devices continuously capture video at a constant frame rate and stream it to the server.
[0237] Step 2:
[0238] The server analyzes the received video data using a generating AI model. This analysis applies an object detection algorithm to identify moving objects and unusual movements from the input data. Specifically, the AI model processes the frames extracted from the video to detect abnormal movements and specific patterns, and generates risk level information as output.
[0239] Step 3:
[0240] The server generates a warning signal based on the generated risk information. If the information processing determines that there is a risk, the information is sent to the warning device. Specifically, when an anomaly is detected, a signal is quickly generated and sent to the terminal.
[0241] Step 4:
[0242] The device receives a warning signal sent from the server and transmits it to the user. Based on the warning signal as input, an audio or vibration notification is generated as output. Specifically, the device alerts the user using the configured notification method to attract the user's attention.
[0243] Step 5:
[0244] The user receives a warning notification from their device, checks their surroundings based on the entered warning information, and takes action to ensure their safety. Specifically, the user who sees the alert immediately checks their surroundings and takes action such as leaving the area if necessary.
[0245] (Application Example 1)
[0246] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0247] In modern homes, there is a need to quickly detect intruders or abnormal activity from outside and ensure the safety of the family. However, conventional security systems consist only of fixed cameras, which can only monitor a limited field of vision, and often rely on reactive measures after an incident has occurred. Therefore, there is a need for a system that can move around inside and outside the house, grasp the situation in real time, and issue a rapid warning.
[0248] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0249] In this invention, the server includes an image acquisition means for monitoring and recording the surrounding environment, an information processing means for receiving and analyzing video data transmitted from the image acquisition means, a warning output means for providing warning information to the user based on the data analyzed by the information processing means, and a means for detecting abnormalities inside and outside the home in real time and notifying the user using voice and data communication means. This makes it possible to monitor the safety inside and outside the home in real time, immediately issue a warning to the user if suspicious activity is detected, and enable a rapid response.
[0250] An "image acquisition means" is a device that has the function of visually recording the surrounding situation and generating video data.
[0251] "Information processing means" refers to a device that analyzes video data received from image acquisition means and performs computational processing to detect anomalies or changes.
[0252] A "warning output means" is a device that transmits abnormalities detected by information processing means to the user and issues a warning via voice or data communication.
[0253] A "data communication device" is a communication device used to send real-time notifications to users in remote locations via a network.
[0254] "Real-time" refers to a time frame that enables immediate responses and actions by transmitting and receiving information with virtually no delay.
[0255] The system that realizes this invention is configured by combining image acquisition means, information processing means, warning output means, and data communication means to ensure safety both inside and outside the home.
[0256] The server uses a 360-degree camera to acquire images and monitor the situation inside and outside the home in real time. The captured video data is analyzed using AI algorithms with high-performance computer information processing capabilities. Specifically, data calculations such as object detection and recognition of motion anomalies are performed using NVIDIA Jetson or similar GPUs.
[0257] Once information processing is complete, the system uses a voice assistant function as a warning output method to immediately notify the user. For example, it may issue voice warnings via Amazon Alexa or Google Assistant, or deliver warning information via push notifications to smartphones.
[0258] Users can receive information in real time through data communication and take safe actions as needed. This network-based communication enables immediate information transmission even to users outside the home.
[0259] A concrete example would be a situation where a suspicious person enters the garden while a family is resting in the living room. In this case, the system captures the person's movements using image acquisition means, detects the anomaly using information processing means, and then sends an alert via voice and to a smartphone through a warning output means. This allows the family to take immediate and safe action.
[0260] As an example of a prompt message to pass to a generative AI model, you could use text such as, "Analyze camera footage, detect if there are any moving objects in the garden, and use the AI to send a warning to the residents if it determines that there is a risk."
[0261] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0262] Step 1:
[0263] The server uses a 360-degree camera to capture video both inside and outside the home. It receives ambient image data as input and processes it as streaming data. The camera's operation generates a continuous video flow.
[0264] Step 2:
[0265] The server passes the acquired video to the information processing system. It receives a video stream as input and uses an AI algorithm to perform object recognition. Here, NVIDIA Jetson is used for data processing to detect moving objects from the video and identify abnormal behavior. If an anomaly is detected, that information is output as identification data.
[0266] Step 3:
[0267] The server assesses the risk level based on the identification data and activates the warning output mechanism. It receives identification data as input and generates data to notify the user using voice assistants and data communication functions. The generated notification data is output and sent to the user's device.
[0268] Step 4:
[0269] The device receives notification data sent from the server and transmits that information to the user. It receives notification data as input and outputs a warning to the user via sound or vibration. It attracts the user's attention by outputting a prompt message such as "Dangerous activity detected. Please check your surroundings."
[0270] Step 5:
[0271] The user reviews the warning and makes decisions to take safe actions. The system receives warning information from the terminal as input and, if necessary, checks the site or takes other safety measures. By repeating these steps, it becomes possible to ensure the user's safety in real time.
[0272] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0273] This invention is a system that combines a camera, a processing unit, a warning device, and an emotion engine to understand the user's surroundings in real time and improve safety in conjunction with their emotional state. This system allows the user to comprehensively manage their own safety and receive appropriate warnings for their situation.
[0274] The server receives video information from the recording equipment and analyzes its content using AI. The analysis process includes object recognition, motion analysis, and emotion recognition by an emotion engine. The emotion engine analyzes data such as the user's facial expressions and voice to evaluate the user's emotional state. For example, if the user shows signs of stress, the server can use that information to adjust the priority and content of warnings.
[0275] The terminal receives analysis results from the server and provides the user with appropriate warnings. The warning device uses voice and vibration to notify the user in a way that suits their emotional state. This may include instructions that encourage calmness so that the user can respond to the situation calmly.
[0276] Users receive notifications from their device and can respond quickly to their psychological state and surrounding environment. For example, if a user is feeling stressed on crowded public transport, the emotion engine detects this state. The device then provides a gentle voice alert to encourage relaxation, making it easier for the user to reduce stress. In this form, the present invention can not only improve safety but also support the user's mental health.
[0277] Furthermore, the server records user behavior history and sentiment data, and performs periodic analysis. This data is used to predict future risks and optimize warnings, enabling the delivery of increasingly personalized services to users.
[0278] The following describes the processing flow.
[0279] Step 1:
[0280] The device collects 360-degree video information in real time from the user's camera and transmits it to the server. This video information also includes the user's facial expressions and actions.
[0281] Step 2:
[0282] The server receives the video information transmitted from the terminal. The received information is analyzed using an AI algorithm. During the analysis process, the server performs object recognition to grasp the surrounding situation. Also, the emotion engine evaluates the emotional state from the user's expression and voice.
[0283] Step 3:
[0284] The server integrates the results of the surrounding risk assessment by object recognition and the emotion analysis by the emotion engine. Based on this result, it generates appropriate warnings according to the urgency and emotional state. In particular, when the emotional state shows stress or anxiety, it adjusts the warning content and notification method.
[0285] Step 4:
[0286] The server transmits the warning generated to the terminal. The terminal prepares to notify the user of this warning.
[0287] Step 5:
[0288] The terminal notifies the user of the warning using sound or vibration. Depending on the user's emotional state, the tone and content of the notification are customized, and it may include a message to promote relaxation.
[0289] Step 6:
[0290] The user receives the warning from the terminal and promptly checks the surrounding situation. Then, the user takes a safe response in a way suitable for the emotional state.
[0291] Step 7:
[0292] The server records the user's action history and emotion data in the database. This data is utilized for improving the accuracy of future warnings and for optimized notifications for individual users.
[0293] (Example 2)
[0294] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0295] In today's social environment, stressful situations occur frequently, potentially deteriorating users' safety and mental state. However, existing safety management systems fail to provide warnings and support that take into account users' emotional states, making them inadequate. This invention aims to solve this problem by constructing a system that grasps users' emotional states in real time and provides personalized warnings and support.
[0296] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0297] In this invention, the server includes means for receiving video information transmitted from a camera for collecting information about the user's surroundings and recognizing objects and actions contained in the information; means for performing analysis necessary to evaluate the user's emotional state; and means for notifying the user of optimized warning information by voice or vibration based on their emotional state and behavioral history. This enables the user to receive personalized advice and warnings in real time that are tailored to their emotional state, allowing them to live a safer and healthier life.
[0298] A "recording device" is a device used to collect information about the user's surroundings and is responsible for acquiring video information.
[0299] A "processing device" is a device that analyzes video information sent from a camera and has the computational functions to recognize objects and actions.
[0300] An "emotion engine" is a system that analyzes a user's facial expressions and voice to evaluate their emotional state.
[0301] A "warning device" is a device that provides warning information to a user based on the information analyzed by a processing device.
[0302] "Voice or vibration means" is a means used by a warning device to notify a user of warning information, including voice guidance and vibration feedback.
[0303] "Behavior history" is a record of a user's past behavior, and is data used for future risk prediction and warning optimization through analysis.
[0304] "Emotion data" is numerical information regarding a user's emotional state, and is data used for the system to provide warnings and advice suitable for the user in real time.
[0305] The present invention is a system for grasping the situation around a user and providing warnings and support according to the user's emotional state. This is realized by a configuration combining a photographing device, a processing device, a warning device, and an emotion engine.
[0306] First, the server receives video information in real time from a photographing device. The photographing device includes a smartphone and other image acquisition devices. The server analyzes these video information using AI technology. Specifically, a model such as YOLO is used for object recognition, and a model such as OpenPose is used for motion analysis.
[0307] Next, the server uses an emotion engine to analyze the user's facial expressions and voice data, and evaluate the emotional state. The emotion engine determines the user's stress level and the like using voice analysis software such as Google Cloud Speech-to-Text.
[0308] The analyzed information is transmitted from the server to the terminal. The terminal provides an optimal warning to the user based on this. The warning device can utilize voice alerts and vibration feedback. For example, it plays a voice guide that encourages relaxation.
[0309] Users can receive notifications from their devices and respond quickly to their psychological state and surrounding environment. For example, in crowded public transport, the emotion engine can detect the user's stress level, and the device can send a gentle voice alert to help the user relax. This system functions as an interactive platform to support the user's mental health.
[0310] An example of a prompt might be: "If you encounter a stressed-out user in a crowded train station, how would the emotion engine respond? What advice could it offer to help them relax?"
[0311] In this way, the present invention provides a system that can comprehensively manage both user safety and mental health.
[0312] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0313] Step 1:
[0314] The server receives video information in real time from the camera. The input is video data capturing the user's surroundings. The server begins analyzing this data using AI technology. A model like YOLO is used for object recognition to identify objects in the video. The output is information about the objects recognized in the video. Specifically, the process involves sending video captured by the camera to the server, breaking down the video frame by frame, and inputting it into the object recognition model.
[0315] Step 2:
[0316] The server performs motion analysis based on the received video information. The input is video data containing object information recognized in step 1. The server uses a motion analysis model such as OpenPose to analyze the user's behavior patterns. The output is information about the current user's behavior patterns. Specific actions include extracting skeletal information of the recognized person and performing calculations to determine the type and state of the action.
[0317] Step 3:
[0318] The server uses an emotion engine to evaluate the user's emotional state. The inputs are the behavioral information obtained in step 2 and the user's voice and facial expression data. The emotion engine analyzes the voice data using software such as Amazon Polly or Google Cloud Speech-to-Text to quantify the user's stress and emotional level. The output is the quantified evaluation result of the emotional state. The specific operation involves sensing changes in voice tone and facial expression and performing an emotional evaluation accordingly.
[0319] Step 4:
[0320] The device receives analysis results from the server and generates a warning. The input is the user's emotional state and behavioral information sent from the server. Based on this information, the device constructs the most appropriate warning for the user. The output is the content of the audio alert and vibration feedback provided to the user. The specific operation involves generating a voice message that promotes relaxation according to the user's state and playing it through the device.
[0321] Step 5:
[0322] The user responds based on the warning information received from the device. Input is a notification from the device via sound or vibration. The user adjusts their mental state based on the notification content. Output is a state in which the user is relaxed and adapted to the stressful environment. Specific actions include following instructions such as "take a deep breath," as an example of the advice given.
[0323] (Application Example 2)
[0324] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0325] In modern living environments, people often experience a great deal of stress and anxiety. Furthermore, while safety must be ensured even at home, current technology struggles to provide appropriate support that takes into account the user's emotional state. A system is needed to improve this situation and enhance the psychological and physical safety of users.
[0326] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0327] In this invention, the server includes a camera component for collecting information about the user's surroundings, processing means for receiving and analyzing video information transmitted from the camera component, notification means for providing warning information to the user based on the information analyzed by the processing means, evaluation means for identifying the user's emotional state and supporting their psychological health, and control means for operating the in-home device according to the emotional state detected by the evaluation means. This enables personalized safety improvements and mental support based on the user's emotional state.
[0328] A "camera component" is a device used to monitor the environment around the user and acquire video information from it.
[0329] A "processing device" is a device that analyzes video information obtained from the captured component to perform object recognition and motion analysis.
[0330] A "notification means" is a device that provides warnings or guidance to the user based on the analysis results of the processing means.
[0331] An "evaluation method" is a system for identifying a user's emotional state from their facial expressions and voice.
[0332] A "control means" is a mechanism that operates devices within the home in accordance with the emotional state obtained by the evaluation means.
[0333] The server acquires video information using a camera component to capture the user's surroundings. This information is analyzed by processing units and used to evaluate the user's current state. In particular, in addition to object recognition and motion analysis, evaluation units are used to identify emotions from the user's facial expressions and voice. This process employs advanced AI algorithms.
[0334] The terminal sends notifications to the user based on analysis results received from the server. The notification system provides appropriate warning information via voice and vibration according to the user's emotional state, and further supports the user's psychological well-being by allowing them to operate home devices through the control system. For example, if the system determines that the user is experiencing high levels of stress, it automatically creates a relaxing environment.
[0335] For example, if the system detects that a user is feeling anxious upon returning home from work, it could provide support by changing the lighting to a warmer color and playing calming music. Furthermore, when using the generative AI model, a prompt such as, "Please suggest appropriate relaxation methods for a user who is feeling stressed upon returning home," could be used.
[0336] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0337] Step 1:
[0338] The server uses the camera component to acquire video information about the user's surroundings. The input for this step is a real-time video feed from the camera, and the output is saving it as digital data. Specifically, it connects to the camera device and captures image data at regular intervals.
[0339] Step 2:
[0340] The server analyzes the acquired video information using processing equipment. This analysis performs object recognition and motion analysis. The input is the digital video data acquired in the previous step, and the output is the data obtained through the analysis, i.e., information about the surrounding environment. Deep learning algorithms are used for processing, and an AI model identifies objects and actions.
[0341] Step 3:
[0342] The server identifies the user's emotional state using evaluation tools. The input for this step is data related to the user's facial expressions and voice, and the output is specific labels or scores indicating the user's emotional state. Specifically, machine learning is used to analyze changes in facial expressions and vocal intonation.
[0343] Step 4:
[0344] The device sends notifications based on the emotional state transmitted from the server. The input for this step is the analysis results regarding the emotional state, and the output is warning or guidance information provided to the user. Specifically, this involves reading the message aloud to the user or using the smartphone's vibration motor to send notifications.
[0345] Step 5:
[0346] The terminal operates home devices in response to the user's emotional state through control means. The input in this step is a control command based on the user's emotional state, and the output is the modified state of the home device. This includes specific actions such as changing the color of the lights or playing music through a smart speaker.
[0347] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0348] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0349] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0350] [Third Embodiment]
[0351] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0352] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0353] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0354] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0355] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0356] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0357] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0358] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0359] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0360] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0361] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0362] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0363] This invention is a system that combines a 360-degree camera, a processing unit that processes received video information, and a warning device that provides warnings according to the situation, in order to improve the safety of users in their daily lives. By using this system, users can always be aware of their surroundings and proactively avoid danger.
[0364] The server receives video information transmitted from the recording equipment and analyzes its contents using an AI algorithm. This AI incorporates object detection, motion analysis, and pattern recognition capabilities, enabling it to identify moving objects such as people and cars and compare them with past behavioral patterns. If the analysis identifies a danger, the server immediately transmits the information to the warning device.
[0365] The device notifies the user of alert information received from the server via voice or vibration. This notification helps the user quickly recognize their surroundings and take appropriate action. For example, when the device detects an approaching vehicle, it can safely attract the user's attention by issuing a loud warning alert.
[0366] Users check alerts from the system and respond to identified risks as quickly as possible. Furthermore, the system can accumulate user activity history and identify recurring mistakes from past patterns, enabling it to provide preventative warnings. For example, if a habit of forgetting to lock the door is detected, the device will issue a reminder before the user leaves the house. In this way, the system can improve convenience while making the user's life safer.
[0367] As a concrete example, consider a scenario where a user is walking through a busy shopping district and a stranger suddenly approaches them in the middle of a crowd. At this moment, the camera captures the person, and the server detects the abnormality of the movement. The warning device instantly sends a notification to the user, and the user ensures their safety by immediately choosing to leave the area. In this form, the present invention can be effectively implemented.
[0368] The following describes the processing flow.
[0369] Step 1:
[0370] The terminal collects 360-degree video information in real time from the camera worn by the user and transmits it to the server.
[0371] Step 2:
[0372] The server receives video information transmitted from the terminal and begins analysis using an AI algorithm. This analysis includes object recognition and motion analysis to detect whether there are any potential dangers in the surrounding environment.
[0373] Step 3:
[0374] Based on the analysis results, the server immediately generates an alert if an anomaly or danger is detected. The alert priority is then evaluated and classified according to its severity.
[0375] Step 4:
[0376] The server sends the generated warning to the terminal. The warning includes audio and vibration notifications and contains specific instructions to draw the user's attention.
[0377] Step 5:
[0378] The device notifies the user of warnings received from the server. It uses voice messages and vibrations to attract the user's attention in real time and encourage safe actions.
[0379] Step 6:
[0380] The user receives a warning from the device and takes appropriate action while checking their surroundings. For example, they might take a step back to create distance from an approaching vehicle.
[0381] Step 7:
[0382] The server records the results of warnings and user responses in a database and updates the behavioral history for later analysis. This data is used to predict future events and improve the accuracy of warnings.
[0383] (Example 1)
[0384] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0385] In daily life, it is crucial for users to quickly recognize and safely address potential dangers in their surroundings. However, conventional methods have faced challenges in continuously monitoring the surrounding environment and providing insufficient real-time warnings. In particular, efficiently detecting moving objects and analyzing past behavioral history has been difficult, making it impossible to proactively detect potential dangers.
[0386] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0387] In this invention, the server includes an electronic device for collecting information about the user's surroundings, an information processing means for receiving video information transmitted from the electronic device and analyzing it using a generative AI model, a warning means for providing a warning signal to the user based on the results of the analysis by the information processing means, and a storage means for recording and analyzing past behavioral history. This enables the user to recognize potential dangers in their surroundings in real time and respond quickly.
[0388] An "electronic device" is a device that collects 360-degree information about the user's surroundings and generates video information.
[0389] "Information processing means" refers to a server-side component that analyzes received video information and uses an AI model to perform object detection, motion analysis, and pattern recognition.
[0390] A "warning device" is a device that provides a warning signal to the user via voice or vibration based on the results of analysis by an information processing device.
[0391] A "generative AI model" is an algorithm used to analyze video information and has the function of recognizing objects and detecting abnormalities in their movements.
[0392] A "memory device" is a component used to predict potential risks by recording and analyzing a user's past behavioral history.
[0393] This invention is a comprehensive monitoring system designed to improve the safety of users in their daily lives. The system consists of an electronic device that captures 360-degree images of the user's surroundings, a server for analyzing the information, and a terminal that alerts the user.
[0394] The server receives video information transmitted from electronic devices and analyzes its content using a generative AI model. Specifically, the information processing system on the server performs object detection, motion analysis, and pattern recognition to identify moving objects such as people and cars, as well as abnormal behavior. The generative AI model uses advanced algorithms and can analyze the situation in real time while comparing it with existing data. It can also record past behavioral history using memory devices and analyze patterns of frequently occurring errors.
[0395] The device receives warning signals from the server and notifies the user via sound and vibration. This allows the user to immediately recognize danger and take appropriate action. For example, if it detects a vehicle approaching rapidly, the device will emit a loud warning to alert the user to the danger. In addition, based on recorded activity history, the device will send a reminder notification to users who frequently forget to lock their doors when going out.
[0396] A concrete example would be a scenario where a user is walking through a busy area and a stranger suddenly approaches them. In this case, the electronic device detects the person, and the server detects the abnormal behavior. An immediate warning is then sent to the user through the device, allowing the user to avoid danger by choosing to safely leave the area.
[0397] An example of a prompt statement is to use the following sentence as input to the generating AI model: "Explain how the system detects and warns of dangerous situations approaching the user while they are walking."
[0398] This configuration not only enhances user safety but also improves convenience.
[0399] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0400] Step 1:
[0401] The server receives high-resolution video data transmitted from electronic devices as input. This video data captures the user's surroundings in 360 degrees in real time. Specifically, the electronic devices continuously capture video at a constant frame rate and stream it to the server.
[0402] Step 2:
[0403] The server analyzes the received video data using a generating AI model. This analysis applies an object detection algorithm to identify moving objects and unusual movements from the input data. Specifically, the AI model processes the frames extracted from the video to detect abnormal movements and specific patterns, and generates risk level information as output.
[0404] Step 3:
[0405] The server generates a warning signal based on the generated risk information. If the information processing determines that there is a risk, the information is sent to the warning device. Specifically, when an anomaly is detected, a signal is quickly generated and sent to the terminal.
[0406] Step 4:
[0407] The device receives a warning signal sent from the server and transmits it to the user. Based on the warning signal as input, an audio or vibration notification is generated as output. Specifically, the device alerts the user using the configured notification method to attract the user's attention.
[0408] Step 5:
[0409] The user receives a warning notification from their device, checks their surroundings based on the entered warning information, and takes action to ensure their safety. Specifically, the user who sees the alert immediately checks their surroundings and takes action such as leaving the area if necessary.
[0410] (Application Example 1)
[0411] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0412] In modern homes, there is a need to quickly detect intruders or abnormal activity from outside and ensure the safety of the family. However, conventional security systems consist only of fixed cameras, which can only monitor a limited field of vision, and often rely on reactive measures after an incident has occurred. Therefore, there is a need for a system that can move around inside and outside the house, grasp the situation in real time, and issue a rapid warning.
[0413] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0414] In this invention, the server includes an image acquisition means for monitoring and recording the surrounding environment, an information processing means for receiving and analyzing video data transmitted from the image acquisition means, a warning output means for providing warning information to the user based on the data analyzed by the information processing means, and a means for detecting abnormalities inside and outside the home in real time and notifying the user using voice and data communication means. This makes it possible to monitor the safety inside and outside the home in real time, immediately issue a warning to the user if suspicious activity is detected, and enable a rapid response.
[0415] An "image acquisition means" is a device that has the function of visually recording the surrounding situation and generating video data.
[0416] "Information processing means" refers to a device that analyzes video data received from image acquisition means and performs computational processing to detect anomalies or changes.
[0417] A "warning output means" is a device that transmits abnormalities detected by information processing means to the user and issues a warning via voice or data communication.
[0418] A "data communication device" is a communication device used to send real-time notifications to users in remote locations via a network.
[0419] "Real-time" refers to a time frame that enables immediate responses and actions by transmitting and receiving information with virtually no delay.
[0420] The system that realizes this invention is configured by combining image acquisition means, information processing means, warning output means, and data communication means to ensure safety both inside and outside the home.
[0421] The server uses a 360-degree camera to acquire images and monitor the situation inside and outside the home in real time. The captured video data is analyzed using AI algorithms with high-performance computer information processing capabilities. Specifically, data calculations such as object detection and recognition of motion anomalies are performed using NVIDIA Jetson or similar GPUs.
[0422] Once information processing is complete, the system uses a voice assistant function as a warning output method to immediately notify the user. For example, it may issue voice warnings via Amazon Alexa or Google Assistant, or deliver warning information via push notifications to smartphones.
[0423] Users can receive information in real time through data communication and take safe actions as needed. This network-based communication enables immediate information transmission even to users outside the home.
[0424] A concrete example would be a situation where a suspicious person enters the garden while a family is resting in the living room. In this case, the system captures the person's movements using image acquisition means, detects the anomaly using information processing means, and then sends an alert via voice and to a smartphone through a warning output means. This allows the family to take immediate and safe action.
[0425] As an example of a prompt message to pass to a generative AI model, you could use text such as, "Analyze camera footage, detect if there are any moving objects in the garden, and use the AI to send a warning to the residents if it determines that there is a risk."
[0426] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0427] Step 1:
[0428] The server uses a 360-degree camera to capture video both inside and outside the home. It receives ambient image data as input and processes it as streaming data. The camera's operation generates a continuous video flow.
[0429] Step 2:
[0430] The server passes the acquired video to the information processing system. It receives a video stream as input and uses an AI algorithm to perform object recognition. Here, NVIDIA Jetson is used for data processing to detect moving objects from the video and identify abnormal behavior. If an anomaly is detected, that information is output as identification data.
[0431] Step 3:
[0432] The server assesses the risk level based on the identification data and activates the warning output mechanism. It receives identification data as input and generates data to notify the user using voice assistants and data communication functions. The generated notification data is output and sent to the user's device.
[0433] Step 4:
[0434] The device receives notification data sent from the server and transmits that information to the user. It receives notification data as input and outputs a warning to the user via sound or vibration. It attracts the user's attention by outputting a prompt message such as "Dangerous activity detected. Please check your surroundings."
[0435] Step 5:
[0436] The user reviews the warning and makes decisions to take safe actions. The system receives warning information from the terminal as input and, if necessary, checks the site or takes other safety measures. By repeating these steps, it becomes possible to ensure the user's safety in real time.
[0437] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0438] This invention is a system that combines a camera, a processing unit, a warning device, and an emotion engine to understand the user's surroundings in real time and improve safety in conjunction with their emotional state. This system allows the user to comprehensively manage their own safety and receive appropriate warnings for their situation.
[0439] The server receives video information from the recording equipment and analyzes its content using AI. The analysis process includes object recognition, motion analysis, and emotion recognition by an emotion engine. The emotion engine analyzes data such as the user's facial expressions and voice to evaluate the user's emotional state. For example, if the user shows signs of stress, the server can use that information to adjust the priority and content of warnings.
[0440] The terminal receives analysis results from the server and provides the user with appropriate warnings. The warning device uses voice and vibration to notify the user in a way that suits their emotional state. This may include instructions that encourage calmness so that the user can respond to the situation calmly.
[0441] Users receive notifications from their device and can respond quickly to their psychological state and surrounding environment. For example, if a user is feeling stressed on crowded public transport, the emotion engine detects this state. The device then provides a gentle voice alert to encourage relaxation, making it easier for the user to reduce stress. In this form, the present invention can not only improve safety but also support the user's mental health.
[0442] Furthermore, the server records user behavior history and sentiment data, and performs periodic analysis. This data is used to predict future risks and optimize warnings, enabling the delivery of increasingly personalized services to users.
[0443] The following describes the processing flow.
[0444] Step 1:
[0445] The device collects 360-degree video information in real time from the user's camera and transmits it to the server. This video information also includes the user's facial expressions and actions.
[0446] Step 2:
[0447] The server receives video information transmitted from the terminal. The received information is analyzed using an AI algorithm. During the analysis process, the server performs object recognition and understands the surrounding environment. It also uses an emotion engine to evaluate the user's emotional state from their facial expressions and voice.
[0448] Step 3:
[0449] The server integrates the results of risk assessment using object recognition and emotion analysis using an emotion engine. Based on these results, it generates appropriate warnings according to the urgency and emotional state. In particular, if the emotional state indicates stress or anxiety, it adjusts the warning content and notification method.
[0450] Step 4:
[0451] The server sends a warning it has generated to the terminal. The terminal then prepares to notify the user of this warning.
[0452] Step 5:
[0453] The device notifies the user of warnings using sound or vibration. Depending on the user's emotional state, the tone and content of the notification may be customized and may include messages to promote relaxation.
[0454] Step 6:
[0455] The user receives a warning from the device, promptly checks their surroundings, and then takes a safe response in a manner appropriate to their emotional state.
[0456] Step 7:
[0457] The server records user behavior history and sentiment data in a database. This data will be used to improve the accuracy of future warnings and to provide optimized notifications for individual users.
[0458] (Example 2)
[0459] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0460] In today's social environment, stressful situations occur frequently, potentially deteriorating users' safety and mental state. However, existing safety management systems fail to provide warnings and support that take into account users' emotional states, making them inadequate. This invention aims to solve this problem by constructing a system that grasps users' emotional states in real time and provides personalized warnings and support.
[0461] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0462] In this invention, the server includes means for receiving video information transmitted from a camera for collecting information about the user's surroundings and recognizing objects and actions contained in the information; means for performing analysis necessary to evaluate the user's emotional state; and means for notifying the user of optimized warning information by voice or vibration based on their emotional state and behavioral history. This enables the user to receive personalized advice and warnings in real time that are tailored to their emotional state, allowing them to live a safer and healthier life.
[0463] A "recording device" is a device used to collect information about the user's surroundings and is responsible for acquiring video information.
[0464] A "processing device" is a device that analyzes video information sent from a camera and has the computational functions to recognize objects and actions.
[0465] An "emotion engine" is a system that analyzes a user's facial expressions and voice to evaluate their emotional state.
[0466] A "warning device" is a device that provides warning information to the user based on information analyzed by the processing unit.
[0467] "Voice or vibration means" refers to means used by a warning device to notify the user of warning information, and includes voice guidance and vibrational feedback.
[0468] "Behavioral history" refers to records of a user's past actions, and this data is used through analysis to predict future risks and optimize warnings.
[0469] "Emotional data" refers to quantified information about a user's emotional state, which the system uses to provide users with appropriate warnings and advice in real time.
[0470] This invention is a system for understanding the user's surroundings and providing warnings and support tailored to the user's emotional state. It is implemented through a combination of imaging equipment, a processing unit, a warning device, and an emotion engine.
[0471] First, the server receives video information in real time from the recording equipment. This recording equipment includes smartphones and other image acquisition devices. The server then analyzes this video information using AI technology. Specifically, it uses models like YOLO for object recognition and models like OpenPose for motion analysis.
[0472] Next, the server uses an emotion engine to analyze the user's facial expressions and voice data to evaluate their emotional state. The emotion engine uses voice analysis software, such as Google Cloud Speech-to-Text, to determine the user's stress level and other factors.
[0473] The analyzed information is sent from the server to the terminal. The terminal then uses this information to provide the user with the most appropriate warning. The warning device can utilize voice alerts or vibration feedback. For example, it can play a voice guide to promote relaxation.
[0474] Users can receive notifications from their devices and respond quickly to their psychological state and surrounding environment. For example, in crowded public transport, the emotion engine can detect the user's stress level, and the device can send a gentle voice alert to help the user relax. This system functions as an interactive platform to support the user's mental health.
[0475] An example of a prompt might be: "If you encounter a stressed-out user in a crowded train station, how would the emotion engine respond? What advice could it offer to help them relax?"
[0476] In this way, the present invention provides a system that can comprehensively manage both user safety and mental health.
[0477] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0478] Step 1:
[0479] The server receives video information in real time from the camera. The input is video data capturing the user's surroundings. The server begins analyzing this data using AI technology. A model like YOLO is used for object recognition to identify objects in the video. The output is information about the objects recognized in the video. Specifically, the process involves sending video captured by the camera to the server, breaking down the video frame by frame, and inputting it into the object recognition model.
[0480] Step 2:
[0481] The server performs motion analysis based on the received video information. The input is video data containing object information recognized in step 1. The server uses a motion analysis model such as OpenPose to analyze the user's behavior patterns. The output is information about the current user's behavior patterns. Specific actions include extracting skeletal information of the recognized person and performing calculations to determine the type and state of the action.
[0482] Step 3:
[0483] The server uses an emotion engine to evaluate the user's emotional state. The inputs are the behavioral information obtained in step 2 and the user's voice and facial expression data. The emotion engine analyzes the voice data using software such as Amazon Polly or Google Cloud Speech-to-Text to quantify the user's stress and emotional level. The output is the quantified evaluation result of the emotional state. The specific operation involves sensing changes in voice tone and facial expression and performing an emotional evaluation accordingly.
[0484] Step 4:
[0485] The device receives analysis results from the server and generates a warning. The input is the user's emotional state and behavioral information sent from the server. Based on this information, the device constructs the most appropriate warning for the user. The output is the content of the audio alert and vibration feedback provided to the user. The specific operation involves generating a voice message that promotes relaxation according to the user's state and playing it through the device.
[0486] Step 5:
[0487] The user responds based on the warning information received from the device. Input is a notification from the device via sound or vibration. The user adjusts their mental state based on the notification content. Output is a state in which the user is relaxed and adapted to the stressful environment. Specific actions include following instructions such as "take a deep breath," as an example of the advice given.
[0488] (Application Example 2)
[0489] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0490] In modern living environments, people often experience a great deal of stress and anxiety. Furthermore, while safety must be ensured even at home, current technology struggles to provide appropriate support that takes into account the user's emotional state. A system is needed to improve this situation and enhance the psychological and physical safety of users.
[0491] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0492] In this invention, the server includes a camera component for collecting information about the user's surroundings, processing means for receiving and analyzing video information transmitted from the camera component, notification means for providing warning information to the user based on the information analyzed by the processing means, evaluation means for identifying the user's emotional state and supporting their psychological health, and control means for operating the in-home device according to the emotional state detected by the evaluation means. This enables personalized safety improvements and mental support based on the user's emotional state.
[0493] A "camera component" is a device used to monitor the environment around the user and acquire video information from it.
[0494] A "processing device" is a device that analyzes video information obtained from the captured component to perform object recognition and motion analysis.
[0495] A "notification means" is a device that provides warnings or guidance to the user based on the analysis results of the processing means.
[0496] An "evaluation method" is a system for identifying a user's emotional state from their facial expressions and voice.
[0497] A "control means" is a mechanism that operates devices within the home in accordance with the emotional state obtained by the evaluation means.
[0498] The server acquires video information using a camera component to capture the user's surroundings. This information is analyzed by processing units and used to evaluate the user's current state. In particular, in addition to object recognition and motion analysis, evaluation units are used to identify emotions from the user's facial expressions and voice. This process employs advanced AI algorithms.
[0499] The terminal sends notifications to the user based on analysis results received from the server. The notification system provides appropriate warning information via voice and vibration according to the user's emotional state, and further supports the user's psychological well-being by allowing them to operate home devices through the control system. For example, if the system determines that the user is experiencing high levels of stress, it automatically creates a relaxing environment.
[0500] For example, if the system detects that a user is feeling anxious upon returning home from work, it could provide support by changing the lighting to a warmer color and playing calming music. Furthermore, when using the generative AI model, a prompt such as, "Please suggest appropriate relaxation methods for a user who is feeling stressed upon returning home," could be used.
[0501] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0502] Step 1:
[0503] The server uses the camera component to acquire video information about the user's surroundings. The input for this step is a real-time video feed from the camera, and the output is saving it as digital data. Specifically, it connects to the camera device and captures image data at regular intervals.
[0504] Step 2:
[0505] The server analyzes the acquired video information using processing equipment. This analysis performs object recognition and motion analysis. The input is the digital video data acquired in the previous step, and the output is the data obtained through the analysis, i.e., information about the surrounding environment. Deep learning algorithms are used for processing, and an AI model identifies objects and actions.
[0506] Step 3:
[0507] The server identifies the user's emotional state using evaluation tools. The input for this step is data related to the user's facial expressions and voice, and the output is specific labels or scores indicating the user's emotional state. Specifically, machine learning is used to analyze changes in facial expressions and vocal intonation.
[0508] Step 4:
[0509] The device sends notifications based on the emotional state transmitted from the server. The input for this step is the analysis results regarding the emotional state, and the output is warning or guidance information provided to the user. Specifically, this involves reading the message aloud to the user or using the smartphone's vibration motor to send notifications.
[0510] Step 5:
[0511] The terminal operates home devices in response to the user's emotional state through control means. The input in this step is a control command based on the user's emotional state, and the output is the modified state of the home device. This includes specific actions such as changing the color of the lights or playing music through a smart speaker.
[0512] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0513] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0514] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0515] [Fourth Embodiment]
[0516] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0517] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0518] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0519] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0520] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0521] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0522] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0523] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0524] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0525] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0526] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0527] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0528] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0529] This invention is a system that combines a 360-degree camera, a processing unit that processes received video information, and a warning device that provides warnings according to the situation, in order to improve the safety of users in their daily lives. By using this system, users can always be aware of their surroundings and proactively avoid danger.
[0530] The server receives video information transmitted from the recording equipment and analyzes its contents using an AI algorithm. This AI incorporates object detection, motion analysis, and pattern recognition capabilities, enabling it to identify moving objects such as people and cars and compare them with past behavioral patterns. If the analysis identifies a danger, the server immediately transmits the information to the warning device.
[0531] The device notifies the user of alert information received from the server via voice or vibration. This notification helps the user quickly recognize their surroundings and take appropriate action. For example, when the device detects an approaching vehicle, it can safely attract the user's attention by issuing a loud warning alert.
[0532] Users check alerts from the system and respond to identified risks as quickly as possible. Furthermore, the system can accumulate user activity history and identify recurring mistakes from past patterns, enabling it to provide preventative warnings. For example, if a habit of forgetting to lock the door is detected, the device will issue a reminder before the user leaves the house. In this way, the system can improve convenience while making the user's life safer.
[0533] As a concrete example, consider a scenario where a user is walking through a busy shopping district and a stranger suddenly approaches them in the middle of a crowd. At this moment, the camera captures the person, and the server detects the abnormality of the movement. The warning device instantly sends a notification to the user, and the user ensures their safety by immediately choosing to leave the area. In this form, the present invention can be effectively implemented.
[0534] The following describes the processing flow.
[0535] Step 1:
[0536] The terminal collects 360-degree video information in real time from the camera worn by the user and transmits it to the server.
[0537] Step 2:
[0538] The server receives video information transmitted from the terminal and begins analysis using an AI algorithm. This analysis includes object recognition and motion analysis to detect whether there are any potential dangers in the surrounding environment.
[0539] Step 3:
[0540] Based on the analysis results, the server immediately generates an alert if an anomaly or danger is detected. The alert priority is then evaluated and classified according to its severity.
[0541] Step 4:
[0542] The server sends the generated warning to the terminal. The warning includes audio and vibration notifications and contains specific instructions to draw the user's attention.
[0543] Step 5:
[0544] The device notifies the user of warnings received from the server. It uses voice messages and vibrations to attract the user's attention in real time and encourage safe actions.
[0545] Step 6:
[0546] The user receives a warning from the device and takes appropriate action while checking their surroundings. For example, they might take a step back to create distance from an approaching vehicle.
[0547] Step 7:
[0548] The server records the results of warnings and user responses in a database and updates the behavioral history for later analysis. This data is used to predict future events and improve the accuracy of warnings.
[0549] (Example 1)
[0550] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0551] In daily life, it is crucial for users to quickly recognize and safely address potential dangers in their surroundings. However, conventional methods have faced challenges in continuously monitoring the surrounding environment and providing insufficient real-time warnings. In particular, efficiently detecting moving objects and analyzing past behavioral history has been difficult, making it impossible to proactively detect potential dangers.
[0552] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0553] In this invention, the server includes an electronic device for collecting information about the user's surroundings, an information processing means for receiving video information transmitted from the electronic device and analyzing it using a generative AI model, a warning means for providing a warning signal to the user based on the results of the analysis by the information processing means, and a storage means for recording and analyzing past behavioral history. This enables the user to recognize potential dangers in their surroundings in real time and respond quickly.
[0554] An "electronic device" is a device that collects 360-degree information about the user's surroundings and generates video information.
[0555] "Information processing means" refers to a server-side component that analyzes received video information and uses an AI model to perform object detection, motion analysis, and pattern recognition.
[0556] A "warning device" is a device that provides a warning signal to the user via voice or vibration based on the results of analysis by an information processing device.
[0557] A "generative AI model" is an algorithm used to analyze video information and has the function of recognizing objects and detecting abnormalities in their movements.
[0558] A "memory device" is a component used to predict potential risks by recording and analyzing a user's past behavioral history.
[0559] This invention is a comprehensive monitoring system designed to improve the safety of users in their daily lives. The system consists of an electronic device that captures 360-degree images of the user's surroundings, a server for analyzing the information, and a terminal that alerts the user.
[0560] The server receives video information transmitted from electronic devices and analyzes its content using a generative AI model. Specifically, the information processing system on the server performs object detection, motion analysis, and pattern recognition to identify moving objects such as people and cars, as well as abnormal behavior. The generative AI model uses advanced algorithms and can analyze the situation in real time while comparing it with existing data. It can also record past behavioral history using memory devices and analyze patterns of frequently occurring errors.
[0561] The device receives warning signals from the server and notifies the user via sound and vibration. This allows the user to immediately recognize danger and take appropriate action. For example, if it detects a vehicle approaching rapidly, the device will emit a loud warning to alert the user to the danger. In addition, based on recorded activity history, the device will send a reminder notification to users who frequently forget to lock their doors when going out.
[0562] A concrete example would be a scenario where a user is walking through a busy area and a stranger suddenly approaches them. In this case, the electronic device detects the person, and the server detects the abnormal behavior. An immediate warning is then sent to the user through the device, allowing the user to avoid danger by choosing to safely leave the area.
[0563] An example of a prompt statement is to use the following sentence as input to the generating AI model: "Explain how the system detects and warns of dangerous situations approaching the user while they are walking."
[0564] This configuration not only enhances user safety but also improves convenience.
[0565] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0566] Step 1:
[0567] The server receives high-resolution video data transmitted from electronic devices as input. This video data captures the user's surroundings in 360 degrees in real time. Specifically, the electronic devices continuously capture video at a constant frame rate and stream it to the server.
[0568] Step 2:
[0569] The server analyzes the received video data using a generating AI model. This analysis applies an object detection algorithm to identify moving objects and unusual movements from the input data. Specifically, the AI model processes the frames extracted from the video to detect abnormal movements and specific patterns, and generates risk level information as output.
[0570] Step 3:
[0571] The server generates a warning signal based on the generated risk information. If the information processing determines that there is a risk, the information is sent to the warning device. Specifically, when an anomaly is detected, a signal is quickly generated and sent to the terminal.
[0572] Step 4:
[0573] The device receives a warning signal sent from the server and transmits it to the user. Based on the warning signal as input, an audio or vibration notification is generated as output. Specifically, the device alerts the user using the configured notification method to attract the user's attention.
[0574] Step 5:
[0575] The user receives a warning notification from their device, checks their surroundings based on the entered warning information, and takes action to ensure their safety. Specifically, the user who sees the alert immediately checks their surroundings and takes action such as leaving the area if necessary.
[0576] (Application Example 1)
[0577] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0578] In modern homes, there is a need to quickly detect intruders or abnormal activity from outside and ensure the safety of the family. However, conventional security systems consist only of fixed cameras, which can only monitor a limited field of vision, and often rely on reactive measures after an incident has occurred. Therefore, there is a need for a system that can move around inside and outside the house, grasp the situation in real time, and issue a rapid warning.
[0579] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0580] In this invention, the server includes an image acquisition means for monitoring and recording the surrounding environment, an information processing means for receiving and analyzing video data transmitted from the image acquisition means, a warning output means for providing warning information to the user based on the data analyzed by the information processing means, and a means for detecting abnormalities inside and outside the home in real time and notifying the user using voice and data communication means. This makes it possible to monitor the safety inside and outside the home in real time, immediately issue a warning to the user if suspicious activity is detected, and enable a rapid response.
[0581] An "image acquisition means" is a device that has the function of visually recording the surrounding situation and generating video data.
[0582] "Information processing means" refers to a device that analyzes video data received from image acquisition means and performs computational processing to detect anomalies or changes.
[0583] A "warning output means" is a device that transmits abnormalities detected by information processing means to the user and issues a warning via voice or data communication.
[0584] A "data communication device" is a communication device used to send real-time notifications to users in remote locations via a network.
[0585] "Real-time" refers to a time frame that enables immediate responses and actions by transmitting and receiving information with virtually no delay.
[0586] The system that realizes this invention is configured by combining image acquisition means, information processing means, warning output means, and data communication means to ensure safety both inside and outside the home.
[0587] The server uses a 360-degree camera to acquire images and monitor the situation inside and outside the home in real time. The captured video data is analyzed using AI algorithms with high-performance computer information processing capabilities. Specifically, data calculations such as object detection and recognition of motion anomalies are performed using NVIDIA Jetson or similar GPUs.
[0588] Once information processing is complete, the system uses a voice assistant function as a warning output method to immediately notify the user. For example, it may issue voice warnings via Amazon Alexa or Google Assistant, or deliver warning information via push notifications to smartphones.
[0589] Users can receive information in real time through data communication and take safe actions as needed. This network-based communication enables immediate information transmission even to users outside the home.
[0590] A concrete example would be a situation where a suspicious person enters the garden while a family is resting in the living room. In this case, the system captures the person's movements using image acquisition means, detects the anomaly using information processing means, and then sends an alert via voice and to a smartphone through a warning output means. This allows the family to take immediate and safe action.
[0591] As an example of a prompt message to pass to a generative AI model, you could use text such as, "Analyze camera footage, detect if there are any moving objects in the garden, and use the AI to send a warning to the residents if it determines that there is a risk."
[0592] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0593] Step 1:
[0594] The server uses a 360-degree camera to capture video both inside and outside the home. It receives ambient image data as input and processes it as streaming data. The camera's operation generates a continuous video flow.
[0595] Step 2:
[0596] The server passes the acquired video to the information processing system. It receives a video stream as input and uses an AI algorithm to perform object recognition. Here, NVIDIA Jetson is used for data processing to detect moving objects from the video and identify abnormal behavior. If an anomaly is detected, that information is output as identification data.
[0597] Step 3:
[0598] The server assesses the risk level based on the identification data and activates the warning output mechanism. It receives identification data as input and generates data to notify the user using voice assistants and data communication functions. The generated notification data is output and sent to the user's device.
[0599] Step 4:
[0600] The device receives notification data sent from the server and transmits that information to the user. It receives notification data as input and outputs a warning to the user via sound or vibration. It attracts the user's attention by outputting a prompt message such as "Dangerous activity detected. Please check your surroundings."
[0601] Step 5:
[0602] The user reviews the warning and makes decisions to take safe actions. The system receives warning information from the terminal as input and, if necessary, checks the site or takes other safety measures. By repeating these steps, it becomes possible to ensure the user's safety in real time.
[0603] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0604] This invention is a system that combines a camera, a processing unit, a warning device, and an emotion engine to understand the user's surroundings in real time and improve safety in conjunction with their emotional state. This system allows the user to comprehensively manage their own safety and receive appropriate warnings for their situation.
[0605] The server receives video information from the recording equipment and analyzes its content using AI. The analysis process includes object recognition, motion analysis, and emotion recognition by an emotion engine. The emotion engine analyzes data such as the user's facial expressions and voice to evaluate the user's emotional state. For example, if the user shows signs of stress, the server can use that information to adjust the priority and content of warnings.
[0606] The terminal receives analysis results from the server and provides the user with appropriate warnings. The warning device uses voice and vibration to notify the user in a way that suits their emotional state. This may include instructions that encourage calmness so that the user can respond to the situation calmly.
[0607] Users receive notifications from their device and can respond quickly to their psychological state and surrounding environment. For example, if a user is feeling stressed on crowded public transport, the emotion engine detects this state. The device then provides a gentle voice alert to encourage relaxation, making it easier for the user to reduce stress. In this form, the present invention can not only improve safety but also support the user's mental health.
[0608] Furthermore, the server records user behavior history and sentiment data, and performs periodic analysis. This data is used to predict future risks and optimize warnings, enabling the delivery of increasingly personalized services to users.
[0609] The following describes the processing flow.
[0610] Step 1:
[0611] The device collects 360-degree video information in real time from the user's camera and transmits it to the server. This video information also includes the user's facial expressions and actions.
[0612] Step 2:
[0613] The server receives video information transmitted from the terminal. The received information is analyzed using an AI algorithm. During the analysis process, the server performs object recognition and understands the surrounding environment. It also uses an emotion engine to evaluate the user's emotional state from their facial expressions and voice.
[0614] Step 3:
[0615] The server integrates the results of risk assessment using object recognition and emotion analysis using an emotion engine. Based on these results, it generates appropriate warnings according to the urgency and emotional state. In particular, if the emotional state indicates stress or anxiety, it adjusts the warning content and notification method.
[0616] Step 4:
[0617] The server sends a warning it has generated to the terminal. The terminal then prepares to notify the user of this warning.
[0618] Step 5:
[0619] The device notifies the user of warnings using sound or vibration. Depending on the user's emotional state, the tone and content of the notification may be customized and may include messages to promote relaxation.
[0620] Step 6:
[0621] The user receives a warning from the device, promptly checks their surroundings, and then takes a safe response in a manner appropriate to their emotional state.
[0622] Step 7:
[0623] The server records user behavior history and sentiment data in a database. This data will be used to improve the accuracy of future warnings and to provide optimized notifications for individual users.
[0624] (Example 2)
[0625] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0626] In today's social environment, stressful situations occur frequently, potentially deteriorating users' safety and mental state. However, existing safety management systems fail to provide warnings and support that take into account users' emotional states, making them inadequate. This invention aims to solve this problem by constructing a system that grasps users' emotional states in real time and provides personalized warnings and support.
[0627] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0628] In this invention, the server includes means for receiving video information transmitted from a camera for collecting information about the user's surroundings and recognizing objects and actions contained in the information; means for performing analysis necessary to evaluate the user's emotional state; and means for notifying the user of optimized warning information by voice or vibration based on their emotional state and behavioral history. This enables the user to receive personalized advice and warnings in real time that are tailored to their emotional state, allowing them to live a safer and healthier life.
[0629] A "recording device" is a device used to collect information about the user's surroundings and is responsible for acquiring video information.
[0630] A "processing device" is a device that analyzes video information sent from a camera and has the computational functions to recognize objects and actions.
[0631] An "emotion engine" is a system that analyzes a user's facial expressions and voice to evaluate their emotional state.
[0632] A "warning device" is a device that provides warning information to the user based on information analyzed by the processing unit.
[0633] "Voice or vibration means" refers to means used by a warning device to notify the user of warning information, and includes voice guidance and vibrational feedback.
[0634] "Behavioral history" refers to records of a user's past actions, and this data is used through analysis to predict future risks and optimize warnings.
[0635] "Emotional data" refers to quantified information about a user's emotional state, which the system uses to provide users with appropriate warnings and advice in real time.
[0636] This invention is a system for understanding the user's surroundings and providing warnings and support tailored to the user's emotional state. It is implemented through a combination of imaging equipment, a processing unit, a warning device, and an emotion engine.
[0637] First, the server receives video information in real time from the recording equipment. This recording equipment includes smartphones and other image acquisition devices. The server then analyzes this video information using AI technology. Specifically, it uses models like YOLO for object recognition and models like OpenPose for motion analysis.
[0638] Next, the server uses an emotion engine to analyze the user's facial expressions and voice data to evaluate their emotional state. The emotion engine uses voice analysis software, such as Google Cloud Speech-to-Text, to determine the user's stress level and other factors.
[0639] The analyzed information is sent from the server to the terminal. The terminal then uses this information to provide the user with the most appropriate warning. The warning device can utilize voice alerts or vibration feedback. For example, it can play a voice guide to promote relaxation.
[0640] Users can receive notifications from their devices and respond quickly to their psychological state and surrounding environment. For example, in crowded public transport, the emotion engine can detect the user's stress level, and the device can send a gentle voice alert to help the user relax. This system functions as an interactive platform to support the user's mental health.
[0641] An example of a prompt might be: "If you encounter a stressed-out user in a crowded train station, how would the emotion engine respond? What advice could it offer to help them relax?"
[0642] In this way, the present invention provides a system that can comprehensively manage both user safety and mental health.
[0643] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0644] Step 1:
[0645] The server receives video information in real time from the camera. The input is video data capturing the user's surroundings. The server begins analyzing this data using AI technology. A model like YOLO is used for object recognition to identify objects in the video. The output is information about the objects recognized in the video. Specifically, the process involves sending video captured by the camera to the server, breaking down the video frame by frame, and inputting it into the object recognition model.
[0646] Step 2:
[0647] The server performs motion analysis based on the received video information. The input is video data containing object information recognized in step 1. The server uses a motion analysis model such as OpenPose to analyze the user's behavior patterns. The output is information about the current user's behavior patterns. Specific actions include extracting skeletal information of the recognized person and performing calculations to determine the type and state of the action.
[0648] Step 3:
[0649] The server uses an emotion engine to evaluate the user's emotional state. The inputs are the behavioral information obtained in step 2 and the user's voice and facial expression data. The emotion engine analyzes the voice data using software such as Amazon Polly or Google Cloud Speech-to-Text to quantify the user's stress and emotional level. The output is the quantified evaluation result of the emotional state. The specific operation involves sensing changes in voice tone and facial expression and performing an emotional evaluation accordingly.
[0650] Step 4:
[0651] The device receives analysis results from the server and generates a warning. The input is the user's emotional state and behavioral information sent from the server. Based on this information, the device constructs the most appropriate warning for the user. The output is the content of the audio alert and vibration feedback provided to the user. The specific operation involves generating a voice message that promotes relaxation according to the user's state and playing it through the device.
[0652] Step 5:
[0653] The user responds based on the warning information received from the device. Input is a notification from the device via sound or vibration. The user adjusts their mental state based on the notification content. Output is a state in which the user is relaxed and adapted to the stressful environment. Specific actions include following instructions such as "take a deep breath," as an example of the advice given.
[0654] (Application Example 2)
[0655] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0656] In modern living environments, people often experience a great deal of stress and anxiety. Furthermore, while safety must be ensured even at home, current technology struggles to provide appropriate support that takes into account the user's emotional state. A system is needed to improve this situation and enhance the psychological and physical safety of users.
[0657] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0658] In this invention, the server includes a camera component for collecting information about the user's surroundings, processing means for receiving and analyzing video information transmitted from the camera component, notification means for providing warning information to the user based on the information analyzed by the processing means, evaluation means for identifying the user's emotional state and supporting their psychological health, and control means for operating the in-home device according to the emotional state detected by the evaluation means. This enables personalized safety improvements and mental support based on the user's emotional state.
[0659] A "camera component" is a device used to monitor the environment around the user and acquire video information from it.
[0660] A "processing device" is a device that analyzes video information obtained from the captured component to perform object recognition and motion analysis.
[0661] A "notification means" is a device that provides warnings or guidance to the user based on the analysis results of the processing means.
[0662] An "evaluation method" is a system for identifying a user's emotional state from their facial expressions and voice.
[0663] A "control means" is a mechanism that operates devices within the home in accordance with the emotional state obtained by the evaluation means.
[0664] The server acquires video information using a camera component to capture the user's surroundings. This information is analyzed by processing units and used to evaluate the user's current state. In particular, in addition to object recognition and motion analysis, evaluation units are used to identify emotions from the user's facial expressions and voice. This process employs advanced AI algorithms.
[0665] The terminal sends notifications to the user based on analysis results received from the server. The notification system provides appropriate warning information via voice and vibration according to the user's emotional state, and further supports the user's psychological well-being by allowing them to operate home devices through the control system. For example, if the system determines that the user is experiencing high levels of stress, it automatically creates a relaxing environment.
[0666] For example, if the system detects that a user is feeling anxious upon returning home from work, it could provide support by changing the lighting to a warmer color and playing calming music. Furthermore, when using the generative AI model, a prompt such as, "Please suggest appropriate relaxation methods for a user who is feeling stressed upon returning home," could be used.
[0667] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0668] Step 1:
[0669] The server uses the camera component to acquire video information about the user's surroundings. The input for this step is a real-time video feed from the camera, and the output is saving it as digital data. Specifically, it connects to the camera device and captures image data at regular intervals.
[0670] Step 2:
[0671] The server analyzes the acquired video information using processing equipment. This analysis performs object recognition and motion analysis. The input is the digital video data acquired in the previous step, and the output is the data obtained through the analysis, i.e., information about the surrounding environment. Deep learning algorithms are used for processing, and an AI model identifies objects and actions.
[0672] Step 3:
[0673] The server identifies the user's emotional state using evaluation tools. The input for this step is data related to the user's facial expressions and voice, and the output is specific labels or scores indicating the user's emotional state. Specifically, machine learning is used to analyze changes in facial expressions and vocal intonation.
[0674] Step 4:
[0675] The device sends notifications based on the emotional state transmitted from the server. The input for this step is the analysis results regarding the emotional state, and the output is warning or guidance information provided to the user. Specifically, this involves reading the message aloud to the user or using the smartphone's vibration motor to send notifications.
[0676] Step 5:
[0677] The terminal operates home devices in response to the user's emotional state through control means. The input in this step is a control command based on the user's emotional state, and the output is the modified state of the home device. This includes specific actions such as changing the color of the lights or playing music through a smart speaker.
[0678] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0679] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0680] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0681] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0682] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0683] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0684] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0685] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0686] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0687] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0688] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0689] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0690] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0691] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0692] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0693] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0694] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0695] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0696] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0697] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0698] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0699] The following is further disclosed regarding the embodiments described above.
[0700] (Claim 1)
[0701] A camera for collecting information about the user's surroundings,
[0702] A processing device for receiving and analyzing video information transmitted from the camera,
[0703] A warning device that provides warning information to the user based on the information analyzed by the processing device,
[0704] A system that includes this.
[0705] (Claim 2)
[0706] The system according to claim 1, wherein the processing device has a function for recording and analyzing the user's past behavior history.
[0707] (Claim 3)
[0708] The system according to claim 1, wherein the warning device notifies the user of warning information by voice or vibration.
[0709] "Example 1"
[0710] (Claim 1)
[0711] Electronic devices for collecting information about the user's surroundings,
[0712] Information processing means for receiving video information transmitted from the electronic device and analyzing it using a generated AI model,
[0713] A warning means that provides a warning signal to the user based on the results of analysis by the aforementioned information processing means,
[0714] A memory device for recording and analyzing past behavioral history,
[0715] A system that includes this.
[0716] (Claim 2)
[0717] The system according to claim 1, wherein the warning means notifies the user of a warning signal by sound or vibration.
[0718] (Claim 3)
[0719] The system according to claim 1, wherein the information processing means has a plurality of motion analysis functions and a function for detecting abnormal movements.
[0720] "Application Example 1"
[0721] (Claim 1)
[0722] Image acquisition means for monitoring and recording the surrounding situation,
[0723] Information processing means for receiving and analyzing video data transmitted from the image acquisition means,
[0724] A warning output means that provides warning information to the user based on the data analyzed by the information processing means,
[0725] A means for detecting abnormalities inside and outside the home in real time and notifying them using voice and data communication means,
[0726] A system that includes this.
[0727] (Claim 2)
[0728] The system according to claim 1, wherein the information processing means has a function to store and analyze past behavioral data and detect trend patterns.
[0729] (Claim 3)
[0730] The system according to claim 1, wherein the warning output means transmits warning information to the user by voice or vibration, and further notifies a remote location via a network through data communication.
[0731] "Example 2 of combining an emotion engine"
[0732] (Claim 1)
[0733] A camera for collecting information about the user's surroundings,
[0734] A processing device for receiving video information transmitted from the recording device and recognizing objects and actions contained in the information,
[0735] A warning device that evaluates the user's emotional state and provides warning information based on the information analyzed by the processing device,
[0736] The aforementioned warning device provides means for notifying warning information using sound or vibration and prompting appropriate interaction according to the user's emotional state,
[0737] A means of regularly analyzing user behavior history and sentiment data to predict future risks and provide personalized warnings,
[0738] A system that includes this.
[0739] (Claim 2)
[0740] The system according to claim 1, wherein the processing device includes an emotion engine that analyzes the user's facial expressions and voice.
[0741] (Claim 3)
[0742] The system according to claim 1, wherein the warning device generates an audio alert that promotes relaxation according to the user's emotional state.
[0743] "Application example 2 when combining with an emotional engine"
[0744] (Claim 1)
[0745] A camera component for collecting information about the user's surroundings,
[0746] Processing means for receiving and analyzing video information transmitted from the camera component,
[0747] A notification means that provides warning information to the user based on the information analyzed by the processing means,
[0748] An evaluation tool to identify the user's emotional state and support their psychological health,
[0749] A control means for operating a household device in accordance with the emotional state detected by the evaluation means,
[0750] A system that includes this.
[0751] (Claim 2)
[0752] The system according to claim 1, wherein the processing means has a function to record and analyze the user's past behavior history, and periodically analyzes the emotional data obtained by the evaluation means.
[0753] (Claim 3)
[0754] The system according to claim 1, wherein the notification means and control means notify the user of warning information and the operation of the in-home device by voice or vibration. [Explanation of Symbols]
[0755] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. Image acquisition means for monitoring and recording the surrounding situation, Information processing means for receiving and analyzing video data transmitted from the image acquisition means, A warning output means that provides warning information to the user based on the data analyzed by the information processing means, A means for detecting abnormalities inside and outside the home in real time and notifying them using voice and data communication means, A system that includes this.
2. The system according to claim 1, wherein the information processing means has a function to store and analyze past behavioral data and detect trend patterns.
3. The system according to claim 1, wherein the warning output means transmits warning information to the user by voice or vibration, and further notifies a remote location via a network through data communication.