system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The system uses image acquisition and behavioral analysis to monitor children in real-time, predicting dangers and sending alerts, effectively preventing accidents.

JP2026096612APending Publication Date: 2026-06-15SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-03
Publication Date: 2026-06-15

AI Technical Summary

Technical Problem

Existing systems fail to provide real-time monitoring and immediate warnings for potential dangers to children in home or public environments, lacking the capability to predict and respond to accidents effectively.

Method used

A system utilizing image acquisition devices, facial recognition, and behavioral pattern analysis to identify individuals and predict dangerous behaviors, generating alerts to relevant parties in real-time.

Benefits of technology

Ensures the safety of children by promptly detecting and alerting users to potential dangers, allowing for swift action to prevent accidents.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026096612000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] An image acquisition device provides a means for collecting video data from multiple locations within the environment, A means for receiving the aforementioned video data and using a face recognition algorithm to identify the face of the subject, A behavioral pattern analysis means for analyzing the behavior of the identified subject and predicting risky behavior, A means for generating an alert and notifying relevant parties when the aforementioned dangerous behavior is detected, A means for recording response information based on the aforementioned alert and accumulating it as training data for improving the behavioral analysis algorithm, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Ensuring the safety of children is an important issue for parents, but it is not realistic for parents to constantly keep an eye on them at home or in public places. Also, it is difficult to prevent accidents that occur during a brief moment when their attention is distracted. To address this issue, there is a need for a system that monitors children's actions in real time, immediately detects dangers, and issues warnings.

Means for Solving the Problems

[0005] This invention involves installing multiple image acquisition devices in a home or public facility to collect video data of the environment. The received video data is processed, and a facial recognition algorithm is used to identify specific individuals. Furthermore, a behavioral pattern analysis means analyzes the individuals' behavior in real time, predicts dangerous behavior, and quickly generates warnings, notifying relevant parties. This enables appropriate action to be taken before an accident occurs, thereby ensuring the safety of children.

[0006] An "image acquisition device" is a hardware device used to continuously collect video data from a specific location within an environment.

[0007] "Video data" refers to a series of digital data containing visual information collected by an image acquisition device.

[0008] A "face recognition algorithm" is a computational method used to identify people in video data and authenticate specific individuals.

[0009] A "target individual" is an individual who is identified by a facial recognition algorithm and is subject to surveillance.

[0010] A "behavioral pattern analysis tool" is a software function that analyzes a subject's movements and predicts dangerous behaviors.

[0011] "Risk behavior" refers to actions that a person may take that could lead to accidents or injuries.

[0012] An "alert" is a warning message or notification generated when dangerous behavior is detected.

[0013] "Stakeholders" refer to individuals such as parents or facility managers who have the authority to receive alerts. [Brief explanation of the drawing]

[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Mode for Carrying Out the Invention

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, a processor with a reference numeral (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, a RAM (Random Access Memory) with a reference numeral is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, a storage with a reference numeral is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0020] In the following embodiments, a communication I / F (Interface) with a reference numeral is an interface including a communication processor, an antenna, and the like. The communication I / F manages communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention relates to a system that uses an image acquisition device to monitor the safety of children in homes and public facilities. This system contributes to ensuring the safety of children without requiring parents or facility managers to physically keep an eye on them.

[0036] In this system, image acquisition devices installed in various locations collect video data of the environment in real time. Terminals transmit this video data to a server, which uses a facial recognition algorithm to identify specific individuals. Through facial recognition, the server identifies the child to be monitored and continuously analyzes the child's behavior using behavioral pattern analysis tools.

[0037] The server rapidly analyzes situations, especially those with a high probability of dangerous behavior, and generates an alert if danger is detected. These alerts are sent in real time to parents and facility managers via smartphones or dedicated devices. This notification allows users to take swift action and ensure the child's safety.

[0038] As a concrete example, consider a scenario where a device monitors a playground area in a park. The server detects that a child is climbing onto a tall piece of playground equipment and is in an unstable position. The server immediately identifies this behavior as dangerous and sends an alert to the user's device saying, "Your child is climbing onto a tall piece of playground equipment. Caution is advised." Upon receiving this notification, the user can rush to the scene and take the necessary action.

[0039] Thus, this invention aims to ensure the safety of children and alleviate parental anxiety by utilizing cameras and AI technology. This makes it possible to efficiently manage children's safety even in busy daily life.

[0040] The following describes the processing flow.

[0041] Step 1:

[0042] The device activates the image acquisition device and prepares to collect real-time video data from within the home or public facility. During this process, the device adjusts the camera's image quality and sets the required frame rate and resolution.

[0043] Step 2:

[0044] The terminal sends the acquired video data to the server. The server receives this data and performs preprocessing for video processing. Preprocessing removes unwanted noise and creates visually accurate video data.

[0045] Step 3:

[0046] The server applies a facial recognition algorithm to the processed video data. This detects the faces of people in the video and matches them with known children in the database to identify specific individuals.

[0047] Step 4:

[0048] The server analyzes the identified individuals using behavioral pattern analysis tools. This analysis compares their actions with existing behavioral records, tracks their movements, and evaluates their behavioral status.

[0049] Step 5:

[0050] The server predicts risky behavior based on the results of behavioral analysis. It detects anomalies by comparing them with a predefined list of risky behaviors, such as unstable movements at high altitudes or approaching specific risk areas.

[0051] Step 6:

[0052] If the server detects dangerous behavior, it will generate an alert. This alert will include the specific nature of the danger and recommended actions.

[0053] Step 7:

[0054] The server generates alerts and sends them to the user. These alerts are sent in real time to the user's smartphone or dedicated device, allowing for immediate confirmation.

[0055] Step 8:

[0056] Users receive alerts and review their contents. By rushing to the scene as needed and taking action to confirm or improve safety, accidents can be prevented.

[0057] Step 9:

[0058] The server records user responses and actions taken in response to alerts. This data is used to improve future models and enhance the accuracy of AI algorithms.

[0059] (Example 1)

[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0061] In modern society, ensuring the safety of people in homes and public facilities is a crucial issue. However, relying solely on physical surveillance has its limitations and may not be sufficient in situations requiring a quick and accurate response. Furthermore, conventional systems lack sufficient capabilities to predict and notify of dangers, making it difficult to adapt to individual circumstances.

[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0063] In this invention, the server includes means for collecting visual information from multiple locations within the facility using image sensors, means for receiving the visual information and using an identification algorithm to identify the faces of individuals, and means for analyzing the behavior of the identified individuals and predicting potentially dangerous behaviors. This makes it possible to automatically and effectively monitor the safety of individuals within the facility and to quickly detect potential dangers.

[0064] An "image sensor" is a device used to collect visual information from the environment, and typically uses camera technology to acquire data in real time.

[0065] "Visual information" refers to records of the environment acquired in the form of images or videos, which allow us to confirm the situation of a specific place or object.

[0066] An "identification algorithm" is a computational method used to extract individual characteristics from collected visual information and identify specific individuals.

[0067] The term "individual" generally refers to a person or object that is identified by a system, and often specifically refers to a person who is the target of facial recognition.

[0068] "Motion analysis means" refers to functions that include technologies and processes for analyzing the motion patterns of identified individuals and detecting normal and abnormal behavior.

[0069] "Potentially dangerous behaviors" refer to actions that an individual may take that the system determines could potentially threaten safety.

[0070] A "warning" is a notification or alert issued when a potential crisis is detected, containing information to prompt relevant parties to take a swift action.

[0071] "Training data" refers to a dataset that is accumulated to improve the accuracy and performance of a system and is used for learning and improvement.

[0072] "Portable electronic devices" are small electronic devices primarily intended for use while on the go, and include smartphones and tablets.

[0073] A "dedicated device" refers to equipment designed specifically for a particular function or application, and is not necessarily used for general purposes.

[0074] This system is designed to monitor the safety of specific individuals using multiple image sensors installed in homes and public facilities. The entire system consists of three main elements: terminals, servers, and users.

[0075] The terminals collect visual information of the surrounding environment in real time through image sensors placed within the facility. This includes commonly used IP cameras and high-resolution webcams. These devices periodically capture data and transmit the collected visual information to a server.

[0076] The server uses an identification algorithm to identify individuals based on the visual information it receives. This process often utilizes Dlib or open-source AI frameworks and includes techniques for facial recognition. The server also has behavioral analysis capabilities, performing behavioral analysis using deep learning platforms such as TENSORFLOW® and PyTorch. Based on the analyzed data, if potentially dangerous behavior is detected, the server immediately generates an alarm and adds it to the notification queue.

[0077] Users receive generated alerts via portable electronic devices or dedicated equipment. Utilizing push notification technology, important alerts are delivered to users in real time. This allows users to quickly understand the situation and take appropriate action.

[0078] As a concrete example, consider a scenario where a server monitors a playground area in a park. The server detects a child in an unstable position on the playground equipment, and if it determines the child's actions are dangerous, it issues an alert to the child's mobile device stating, "Your child is climbing on high playground equipment. Caution is advised." In this way, the system aims to enhance individual safety based on real-world scenarios.

[0079] Examples of prompts for a generative AI model include the following:

[0080] "Please generate a program that analyzes the position and movement of children in images and evaluates their safety."

[0081] "Please describe a system that monitors playground areas in parks and detects dangerous behaviors among children."

[0082] This system makes it possible to efficiently and effectively manage the safety of individuals within a specific facility.

[0083] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0084] Step 1:

[0085] The terminal captures real-time visual information using image sensors installed in the facility. This visual information includes video data of the environment, which is input from the image sensors. This data is captured at a specified frame rate and resolution and sent to a server for subsequent processing.

[0086] Step 2:

[0087] The server receives visual information transmitted from the terminal. Next, it applies a face recognition algorithm to identify the faces of individuals present in the visual information. The input data consists of images and video frames; the algorithm extracts feature points and outputs the identified face information. Specifically, it utilizes frameworks such as Dlib to determine the position of faces in real time.

[0088] Step 3:

[0089] The server analyzes the behavior of identified individuals using motion analysis tools. In this step, the server tracks the individual's movements and changes in position based on the facial information it has identified. Here, the input data is real-time motion data, and abnormal or potentially dangerous behaviors are detected through motion analysis, with the information output. Using an AI framework such as TensorFlow, behavioral patterns are analyzed using a deep learning model.

[0090] Step 4:

[0091] The server immediately generates an alarm if it detects potentially dangerous behavior based on the analysis results. Here, the input data is the result of the behavioral analysis, and an urgent notification is generated through the alarm generation routine, with its content output. Specifically, the alarm priority is set according to the detected level of danger, and it is added to the notification queue.

[0092] Step 5:

[0093] Users receive alarms distributed from the server via portable electronic devices or dedicated equipment. Input data consists of alarm information from the server, which users use to take prompt action. Output includes actions taken by the user to go to the scene and implement necessary countermeasures. Users are required to review the alarms and make appropriate decisions based on the information provided by the system.

[0094] (Application Example 1)

[0095] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0096] Efficiently monitoring children's safety in both home and public environments remains a challenge in today's busy lifestyle. Traditional monitoring methods struggle to detect and address dangers in real time, placing a heavy burden on parents and facility managers. This makes it difficult to prevent accidental injuries among children.

[0097] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0098] In this invention, the server includes means for collecting visual data from multiple points in the environment using an image acquisition device, means for using a facial recognition algorithm to identify individuals, and means for generating a warning and notifying relevant parties when dangerous behavior is detected. This enables relevant parties to efficiently monitor the child's behavior and respond quickly when danger is detected.

[0099] An "image acquisition device" is a device used to collect visual data from multiple points within an environment.

[0100] "Visual data" refers to information acquired in the form of images or videos.

[0101] A "face recognition algorithm" is a computational method for analyzing and identifying an individual's characteristics from received visual data.

[0102] "Individual" refers to a specific person who is being monitored.

[0103] A "motion analysis device" is a system that analyzes an individual's movements and predicts dangerous actions.

[0104] "Dangerous behavior" refers to actions that may threaten an individual's safety.

[0105] A "warning" is a notification that is generated when dangerous behavior is detected.

[0106] A "communication device" is a device or system used to transmit information to a terminal.

[0107] A "terminal" refers to a portable device or a dedicated receiving device, which is a device used to receive warnings.

[0108] "Learning information" refers to data that records response information based on warnings and is used to optimize algorithms.

[0109] To realize this invention, it is first necessary to install network-enabled image acquisition devices in homes and public facilities. These cameras collect visual data in real time from multiple points in the environment and transmit it to a cloud-based server. The server identifies specific individuals from the received visual data using facial recognition algorithms (e.g., OpenCV or AWS® Rekognition). The actions of the identified individuals are analyzed by a motion analysis device, and if a pre-defined dangerous action is detected, the server generates a warning.

[0110] This warning is transmitted via communication equipment to the user's mobile device or a dedicated receiver. Receiving this warning allows the user to respond quickly if danger is imminent. Furthermore, response information based on the warning is recorded on a server and stored as learning data used to optimize the motion analysis algorithm.

[0111] As a concrete example, consider a scenario where a camera installed in a park playground monitors children playing on a swing. This system detects when a child is about to fall off the swing and identifies the action as dangerous. At this point, the server generates a warning message saying, "Your child is making dangerous movements on the swing. Please be careful," and immediately sends it to the parent's smartphone.

[0112] An example of a prompt message is: "This is a child safety monitoring system. Identify children from specific camera footage and issue an alert if dangerous behavior is detected. The alert should include a description of the specific behavior and send a push notification to the smartphone."

[0113] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0114] Step 1:

[0115] The server receives visual data in real time from a network-enabled image acquisition device. The input is video data from the camera, and the server prepares this data to send to a face recognition algorithm.

[0116] Step 2:

[0117] The server executes a face recognition algorithm to identify a specific individual from video data. The input is the video data obtained in step 1, and the server analyzes the position and feature points of faces to obtain individual identification information as output. Technologies such as OpenCV and AWS Rekognition are utilized in this process.

[0118] Step 3:

[0119] The server analyzes the actions of the identified individual using a motion analysis device. The input is the individual identification information from step 2, and a motion analysis algorithm is applied to extract behavioral patterns, obtaining motion features as output.

[0120] Step 4:

[0121] The server predicts dangerous behavior from the obtained behavioral features. The input is the behavioral features from step 3, which are compared with pre-defined dangerous behavior patterns to evaluate the degree of danger, and the server generates a dangerous behavior detection result as output.

[0122] Step 5:

[0123] The server generates a warning when dangerous behavior is detected and sends it to the user's terminal via a communication device. The input is the detection result from step 4, which is used to construct the warning message and create notification information for the terminal as output. The user receives this information and can take action to avoid the crisis.

[0124] Step 6:

[0125] The response information based on the warnings is recorded on the server and stored as learning information used to optimize the behavioral analysis algorithm. The input is user response information, which is stored in the database and output as feedback data that will be useful for future improvements to the algorithm.

[0126] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0127] This invention is a system that combines an image acquisition device and an emotion recognition engine, and is intended to monitor the safety and emotional state of children in homes or public facilities. This system allows parents or facility managers to remotely check on a child's safety and psychological state even when they are not physically nearby.

[0128] The terminal collects real-time video data from image acquisition devices installed in homes or facilities. This data is then sent to a server, which applies a facial recognition algorithm to the received video to identify specific children who are being monitored.

[0129] Next, the server analyzes the behavioral patterns of the recognized individuals and uses an emotion recognition engine to analyze their emotional state. The emotion recognition engine analyzes facial expressions and body movements from the video data to determine whether the individuals are emotionally agitated or calm.

[0130] This information is linked to the prediction of risky behavior, and if high-risk behavior is detected, an alert is quickly generated. The results of emotion recognition are also used to adjust the priority and urgency of alerts. For example, when emotions are heightened, a high-urgency alert is issued.

[0131] The generated alerts are immediately sent to the user's mobile device or dedicated device. This allows the user to understand the detailed situation, including the emotional state of the person concerned, and take a quick and appropriate response.

[0132] For example, if a device is monitoring a playground in a park and the video footage shows a child becoming agitated on the play equipment, the server will perceive this as a dangerous situation and send an alert to the user indicating that the child is emotionally agitated. The user can then take action to go to the location and calm the child down.

[0133] Thus, by incorporating emotion recognition technology, this invention provides a system that achieves more comprehensive child safety management while also considering the maintenance of children's psychological health.

[0134] The following describes the processing flow.

[0135] Step 1:

[0136] The terminal activates the image acquisition device. This starts capturing video data in real time from the installed location. The collected video data is then ready to be sent to the server.

[0137] Step 2:

[0138] The server analyzes the received video data. First, it performs data preprocessing, such as noise reduction and brightness correction, to improve recognition accuracy.

[0139] Step 3:

[0140] The server uses a facial recognition algorithm to identify the face of a specific subject from the video data. The identified face is then compared against known faces registered in the database.

[0141] Step 4:

[0142] The server performs behavioral pattern analysis. It tracks the movements of identified individuals and determines whether their actions constitute dangerous behavior by comparing them with existing data.

[0143] Step 5:

[0144] The server activates an emotion recognition engine and evaluates the subject's emotional state based on their facial expressions and body movements. It determines whether the subject is emotionally agitated or calm, and passes that data to the next step.

[0145] Step 6:

[0146] The server integrates behavioral analysis results and emotion recognition results to determine the appropriate response if dangerous behavior is detected. In particular, if emotions are heightened, the urgency of the warning is increased, and an alert is generated immediately.

[0147] Step 7:

[0148] The server generates alerts and sends them to the user's mobile device or a dedicated device. The alerts include details of the behavior and emotional state, and are presented in a way that allows the user to respond quickly.

[0149] Step 8:

[0150] The user reviews the alert, understands its content, and takes appropriate action to ensure the safety of the person concerned. For example, this could involve temporarily monitoring the situation or physically going to the site to improve the situation.

[0151] Step 9:

[0152] The server records user responses and stores the corresponding actions as training data. This data will be used to improve the algorithm in the future.

[0153] (Example 2)

[0154] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0155] In recent years, the need for safety management in homes and public facilities has increased, but ensuring the safety of individuals requiring special care, such as young children, and appropriately monitoring their emotional states is not easy. Furthermore, conventional monitoring systems have difficulty detecting changes in emotions, posing a challenge in predicting dangerous situations in advance and responding quickly.

[0156] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0157] In this invention, the server includes means for collecting visual information from multiple locations in the environment, means for using a face recognition algorithm to identify the face of a target individual, and means for using an emotion evaluation engine to analyze the emotional state of the identified target individual. This makes it possible to identify and analyze the face and emotional state of a target individual with high accuracy and predict potential dangers.

[0158] An "image acquisition device" is a device used to collect visual information within an environment, and is capable of continuously acquiring information from multiple locations at high resolution.

[0159] "Visual information" refers to video and image data of the environment collected by image acquisition devices, and is used to identify people and other objects.

[0160] A "face recognition algorithm" is a computational method or program for detecting and identifying the face of a specific individual from visual information, and recognizes individuals based on specific feature points.

[0161] An "emotion evaluation engine" is a technological element that analyzes the emotional state of an identified individual from its facial expressions and movements, and identifies the type of emotion, such as excitement or calmness.

[0162] "Behavioral analysis methods" refer to algorithms and systems used to analyze the movements and past behavioral patterns of a target individual and predict abnormal or dangerous behaviors.

[0163] A "warning" is a notification generated when the level of danger increases based on emotional state and behavioral analysis, and it serves as a signal to quickly convey information to relevant parties.

[0164] A "portable device" is an electronic device intended to be carried by the user at all times and capable of receiving and transmitting information.

[0165] In this invention, the terminal first uses an image acquisition device to collect visual information from the environment. This image acquisition device is equipped with a high-resolution camera, is installed in homes and public facilities, and has the function of continuously recording video day and night. The collected visual information is efficiently transmitted to a server using compression technology.

[0166] Next, the server processes the received visual information. The technology used here is a face recognition algorithm, specifically utilizing libraries such as OpenCV and Dlib. This algorithm allows the server to accurately detect and identify the face of a specific individual. Based on the results, an emotion evaluation engine is activated to analyze the target individual's emotional state in real time. This engine evaluates emotions by referring to subtle changes in facial expressions and body movements, and in practice, it often utilizes APIs from general cloud service providers.

[0167] In addition, the behavioral analysis system evaluates the behavior of the target individual based on past data and accumulated training data. This analysis makes it possible to predict potential dangerous behaviors in advance and take countermeasures quickly. The generated warnings are notified to the user's mobile device in real time, allowing the user to quickly understand the situation and respond appropriately.

[0168] As a concrete example, suppose a terminal installed in a park is monitoring children near playground equipment, and the video captures a child appearing extremely excited on the equipment. The server analyzes this footage, and if it detects heightened emotions, it immediately sends an advanced warning to the user. In this case, an example of a prompt message might be "Detect a child who is excited on the playground equipment and generate a warning."

[0169] This allows users to grasp the specific situation on-site and take quick and appropriate action. This system further improves safety management by utilizing emotion recognition technology and highly accurate data analysis.

[0170] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0171] Step 1:

[0172] The terminal collects visual information from the environment using an image acquisition device. Specifically, it uses a high-resolution camera to capture continuous video data, which is then used as input. At this stage, the video data is sent to the server in a compressed format. The output is video data in a format that the server can process.

[0173] Step 2:

[0174] The server decodes the received compressed video data and applies a face recognition algorithm. The input is the decoded video frames, and the server uses image processing techniques to identify faces within each frame. The output of this operation is data showing the position of each face in each frame and its corresponding feature points.

[0175] Step 3:

[0176] The server activates an emotion evaluation engine based on the results of face recognition. This engine analyzes the input facial feature data and evaluates the emotional state based on changes and dynamics in facial expressions. The output of this process is an emotion score indicating the degree of excitement or calmness.

[0177] Step 4:

[0178] The server uses sentiment scores to perform behavioral analysis. Using past data and currently obtained sentiment results as input, the algorithm recognizes anomalies and abnormal patterns. The output of this process is the detection result of high-risk behavioral patterns.

[0179] Step 5:

[0180] The server generates warnings based on the results of behavioral analysis. The input is the risk assessment result, and the server uses this to generate warning messages and determine their priority. The warning messages are output, and the server is ready to notify the user.

[0181] Step 6:

[0182] The server sends the generated warning to the user's mobile device. The input consists of the warning message and the user's connection information, and the output is sent via a communication method such as Firebase Cloud Messaging. At this stage, the user receives information in real time, allowing them to quickly understand the situation.

[0183] (Application Example 2)

[0184] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0185] In modern homes and public facilities, there is a need for systems that can efficiently monitor the safety and psychological state of individuals remotely. However, conventional monitoring systems are limited to detecting physically dangerous behaviors and have difficulty adjusting for changes in psychological state and the resulting urgency. Therefore, in situations where a more comprehensive and rapid response is required, it is difficult to provide appropriate information to relevant parties.

[0186] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0187] In this invention, the server includes means for adjusting and generating the urgency of alerts based on the psychological state, means for determining the psychological state of the subject using emotion recognition technology, and means for recording response information based on the alerts and storing it as learning data for improving the behavioral analysis algorithm. This makes it possible to adjust alerts to reflect the psychological state of the subject and provide rapid response measures.

[0188] An "image acquisition device" is hardware used to collect video data from multiple locations within an environment.

[0189] A "face recognition algorithm" is a series of computational methods used to identify a person's face from video data.

[0190] A "behavioral pattern analysis means" is a processing means for analyzing the behavior of identified individuals and predicting risky behaviors.

[0191] "Emotion recognition technology" is a technology that analyzes facial expressions and body movements from video data in order to determine the psychological state of a subject.

[0192] An "alert" is a warning message generated based on detected risky behavior or psychological state, and notified to the relevant parties.

[0193] "Response measures tailored to the urgency of the situation" refers to specific measures and action guidelines provided according to the urgency of the alert.

[0194] "Training data" refers to data used to record response information based on alerts and to improve behavioral analysis algorithms.

[0195] In this invention, the entire system operates through the coordinated operation of various electronic devices. The system uses image acquisition devices installed in homes and public facilities to collect video data from multiple locations within the environment. A terminal receives this video data and transmits it to a server.

[0196] The server transfers video footage acquired by devices such as Raspberry Pi and Jetson Nano to a cloud server for image recognition and emotion recognition. The cloud server processes the acquired data using pre-trained face recognition algorithms and emotion recognition models based on TensorFlow and PyTorch. This allows for the identification of the subject's face and analysis of their behavior.

[0197] Furthermore, by applying emotion recognition technology and analyzing facial expressions and body movements obtained from the video, the psychological state of the subject is determined. This information is used to adjust the urgency of the alert, and alerts are generated according to the level of urgency.

[0198] Alerts are quickly sent to the user's mobile device or dedicated device. From this notification, the user can quickly grasp the details of the situation and go to the scene if necessary. For example, if a home robot detects unusual behavior or emotional changes around a child, the user will receive a message saying, "This child is showing a specific reaction and needs to be investigated."

[0199] Thus, by combining image acquisition technology and emotion recognition technology, this system enables more effective safety management and maintenance of psychological health.

[0200] A concrete example of a prompt to be input into the generating AI model might be: "Observe in real time what emotional state a child is in at home, and if any unusual behavior or emotion is detected, tell me what kind of notification to send to the parents based on that."

[0201] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0202] Step 1:

[0203] The terminal collects video data in real time from image acquisition devices installed within the environment. The terminal receives this video data as input and prepares it for transmission to the server. Specifically, it uses a camera device to capture surveillance video within a specified area. It also compresses the data as needed to improve transmission efficiency.

[0204] Step 2:

[0205] The server receives video data transmitted from the terminal. The server takes this data as input and applies a face recognition algorithm to identify the subject's face. Specifically, it detects the position of the face in each video frame and extracts the feature quantities of the identified face. Then, it identifies who the person is by comparing the feature quantities with a database.

[0206] Step 3:

[0207] The server analyzes the behavior of identified individuals and predicts risky behaviors. Using the analyzed behavioral data as input, it performs data calculations with behavioral pattern analysis tools and outputs behavioral anomalies and predicted risks. Specifically, it compares current behavioral data with past behavioral data to identify and analyze deviations from norms.

[0208] Step 4:

[0209] The server applies emotion recognition technology using facial information from video data to determine the subject's psychological state. In this step, the data after facial recognition is further analyzed, and the psychological state is inferred based on an emotion model. Emotions are quantified from facial expressions and subtle body movements, and these values are used to identify psychological heightened or calm states.

[0210] Step 5:

[0211] The server considers both risky behavior and psychological state to adjust and generate alert urgency. Using risk level and psychological data as input, the alert generation algorithm adjusts the urgency and content of the alert, outputting information tailored to the user. Specifically, if emotions are heightened, the alert priority is increased, and a warning including a message urging immediate action is prepared.

[0212] Step 6:

[0213] A pre-configured alert, generated on the server, is sent to the user's mobile device or dedicated device. The user receives this notification and, based on the entered information, can then go to the site and take action. This step utilizes push notification technology to ensure the alert content is quickly communicated.

[0214] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0215] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0216] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0217] [Second Embodiment]

[0218] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0219] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0220] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0221] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0222] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0223] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0224] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0225] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0226] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0227] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0228] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0229] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0230] This invention relates to a system that uses an image acquisition device to monitor the safety of children in homes and public facilities. This system contributes to ensuring the safety of children without requiring parents or facility managers to physically keep an eye on them.

[0231] In this system, image acquisition devices installed in various locations collect video data of the environment in real time. Terminals transmit this video data to a server, which uses a facial recognition algorithm to identify specific individuals. Through facial recognition, the server identifies the child to be monitored and continuously analyzes the child's behavior using behavioral pattern analysis tools.

[0232] The server rapidly analyzes situations, especially those with a high probability of dangerous behavior, and generates an alert if danger is detected. These alerts are sent in real time to parents and facility managers via smartphones or dedicated devices. This notification allows users to take swift action and ensure the child's safety.

[0233] As a concrete example, consider a scenario where a device monitors a playground area in a park. The server detects that a child is climbing onto a tall piece of playground equipment and is in an unstable position. The server immediately identifies this behavior as dangerous and sends an alert to the user's device saying, "Your child is climbing onto a tall piece of playground equipment. Caution is advised." Upon receiving this notification, the user can rush to the scene and take the necessary action.

[0234] Thus, this invention aims to ensure the safety of children and alleviate parental anxiety by utilizing cameras and AI technology. This makes it possible to efficiently manage children's safety even in busy daily life.

[0235] The following describes the processing flow.

[0236] Step 1:

[0237] The device activates the image acquisition device and prepares to collect real-time video data from within the home or public facility. During this process, the device adjusts the camera's image quality and sets the required frame rate and resolution.

[0238] Step 2:

[0239] The terminal sends the acquired video data to the server. The server receives this data and performs preprocessing for video processing. Preprocessing removes unwanted noise and creates visually accurate video data.

[0240] Step 3:

[0241] The server applies a facial recognition algorithm to the processed video data. This detects the faces of people in the video and matches them with known children in the database to identify specific individuals.

[0242] Step 4:

[0243] The server analyzes the identified individuals using behavioral pattern analysis tools. This analysis compares their actions with existing behavioral records, tracks their movements, and evaluates their behavioral status.

[0244] Step 5:

[0245] The server predicts risky behavior based on the results of behavioral analysis. It detects anomalies by comparing them with a predefined list of risky behaviors, such as unstable movements at high altitudes or approaching specific risk areas.

[0246] Step 6:

[0247] If the server detects dangerous behavior, it will generate an alert. This alert will include the specific nature of the danger and recommended actions.

[0248] Step 7:

[0249] The server generates alerts and sends them to the user. These alerts are sent in real time to the user's smartphone or dedicated device, allowing for immediate confirmation.

[0250] Step 8:

[0251] Users receive alerts and review their contents. By rushing to the scene as needed and taking action to confirm or improve safety, accidents can be prevented.

[0252] Step 9:

[0253] The server records user responses and actions taken in response to alerts. This data is used to improve future models and enhance the accuracy of AI algorithms.

[0254] (Example 1)

[0255] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0256] In modern society, ensuring the safety of people in homes and public facilities is a crucial issue. However, relying solely on physical surveillance has its limitations and may not be sufficient in situations requiring a quick and accurate response. Furthermore, conventional systems lack sufficient capabilities to predict and notify of dangers, making it difficult to adapt to individual circumstances.

[0257] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0258] In this invention, the server includes means for collecting visual information from multiple locations within the facility using image sensors, means for receiving the visual information and using an identification algorithm to identify the faces of individuals, and means for analyzing the behavior of the identified individuals and predicting potentially dangerous behaviors. This makes it possible to automatically and effectively monitor the safety of individuals within the facility and to quickly detect potential dangers.

[0259] An "image sensor" is a device used to collect visual information from the environment, and typically uses camera technology to acquire data in real time.

[0260] "Visual information" refers to records of the environment acquired in the form of images or videos, which allow us to confirm the situation of a specific place or object.

[0261] An "identification algorithm" is a computational method used to extract individual characteristics from collected visual information and identify specific individuals.

[0262] The term "individual" generally refers to a person or object that is identified by a system, and often specifically refers to a person who is the target of facial recognition.

[0263] "Motion analysis means" refers to functions that include technologies and processes for analyzing the motion patterns of identified individuals and detecting normal and abnormal behavior.

[0264] "Potentially dangerous behaviors" refer to actions that an individual may take that the system determines could potentially threaten safety.

[0265] A "warning" is a notification or alert issued when a potential crisis is detected, containing information to prompt relevant parties to take a swift action.

[0266] "Training data" refers to a dataset that is accumulated to improve the accuracy and performance of a system and is used for learning and improvement.

[0267] "Portable electronic devices" are small electronic devices primarily intended for use while on the go, and include smartphones and tablets.

[0268] A "dedicated device" refers to equipment designed specifically for a particular function or application, and is not necessarily used for general purposes.

[0269] This system is designed to monitor the safety of specific individuals using multiple image sensors installed in homes and public facilities. The entire system consists of three main elements: terminals, servers, and users.

[0270] The terminals collect visual information of the surrounding environment in real time through image sensors placed within the facility. This includes commonly used IP cameras and high-resolution webcams. These devices periodically capture data and transmit the collected visual information to a server.

[0271] The server uses an identification algorithm to identify individuals based on the visual information it receives. This process often employs Dlib or open-source AI frameworks and includes techniques for facial recognition. The server also has behavioral analysis capabilities, performing behavioral analysis using deep learning platforms such as TensorFlow and PyTorch. Based on the analyzed data, if potentially dangerous behavior is detected, the server immediately generates an alarm and adds it to the notification queue.

[0272] Users receive generated alerts via portable electronic devices or dedicated equipment. Utilizing push notification technology, important alerts are delivered to users in real time. This allows users to quickly understand the situation and take appropriate action.

[0273] As a concrete example, consider a scenario where a server monitors a playground area in a park. The server detects a child in an unstable position on the playground equipment, and if it determines the child's actions are dangerous, it issues an alert to the child's mobile device stating, "Your child is climbing on high playground equipment. Caution is advised." In this way, the system aims to enhance individual safety based on real-world scenarios.

[0274] Examples of prompts for a generative AI model include the following:

[0275] "Please generate a program that analyzes the position and movement of children in images and evaluates their safety."

[0276] "Please describe a system that monitors playground areas in parks and detects dangerous behaviors among children."

[0277] This system enables efficient and effective safety management of individuals within a specific facility.

[0278] The flow of the specific process in Example 1 will be described using FIG. 11.

[0279] Step 1:

[0280] The terminal captures real-time visual information using an image sensor installed in the facility. This visual information includes video data within the environment and is input from the image sensor. This data is captured at a specified frame rate and resolution and is transmitted to the server for subsequent processing.

[0281] Step 2:

[0282] The server receives the visual information transmitted from the terminal. Next, a face recognition algorithm is applied to identify the faces of individuals present in the visual information. The input data is an image or video frame, and the algorithm is used to extract feature points and output the identified face information. Specifically, a framework such as Dlib is utilized to determine the position of the face in real time.

[0283] Step 3:

[0284] The server analyzes the actions of the identified individuals using motion analysis means. In this step, based on the face information identified by the server, the movements and position changes of the individuals are tracked. Here, the input data is real-time motion data, and through motion analysis, abnormal behaviors and potentially dangerous behaviors are detected, and the information is output. An AI framework such as TensorFlow is used to analyze the behavior patterns with a deep learning model.

[0285] Step 4:

[0286] When the server detects potential dangerous behavior from the analysis results, it immediately generates an alarm. Here, the input data is the result of behavior analysis, and through the alarm generation routine, an urgent notification is generated and its content is output. As a specific operation, according to the detected degree of danger, the priority of the alarm is set and added to the notification queue.

[0287] Step 5:

[0288] The user receives the alarm distributed from the server on a mobile electronic device or a dedicated device. The input data is the alarm information from the server, and the user makes a quick response based on this. The output includes the action of the user going to the scene and taking necessary countermeasures. The user is required to confirm the alarm and make an appropriate judgment based on the information provided by the system.

[0289] (Application Example 1)

[0290] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0291] Efficiently monitoring the safety of children in a home or public environment remains an issue in modern busy lives. With conventional monitoring methods, it is difficult to detect and counter dangers in real time, and the burden on parents and facility managers is large. As a result, there is a problem that it is difficult to prevent accidental injuries to children.

[0292] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0293] In this invention, the server includes means for collecting visual data from multiple locations in the environment by an image acquisition device, means for using a face recognition algorithm to identify individuals, and means for generating a warning and notifying relevant persons when detecting dangerous behavior. Thereby, relevant persons can efficiently monitor the actions of children and respond quickly when a danger is detected.

[0294] An "image acquisition device" is a device used to collect visual data from multiple points within an environment.

[0295] "Visual data" refers to information acquired in the form of images or videos.

[0296] A "face recognition algorithm" is a computational method for analyzing and identifying an individual's characteristics from received visual data.

[0297] "Individual" refers to a specific person who is being monitored.

[0298] A "motion analysis device" is a system that analyzes an individual's movements and predicts dangerous actions.

[0299] "Dangerous behavior" refers to actions that may threaten an individual's safety.

[0300] A "warning" is a notification that is generated when dangerous behavior is detected.

[0301] A "communication device" is a device or system used to transmit information to a terminal.

[0302] A "terminal" refers to a portable device or a dedicated receiving device, which is a device used to receive warnings.

[0303] "Learning information" refers to data that records response information based on warnings and is used to optimize algorithms.

[0304] To implement this invention, first, it is necessary to install a network-compatible image acquisition device in homes and public facilities. These cameras collect visual data in real-time from multiple locations within the environment and transmit it to a cloud-based server. The server uses a face recognition algorithm (e.g., OpenCV or AWS Rekognition) to identify specific individuals from the received visual data. The actions of the identified individuals are analyzed by an action analysis device, and if a preset dangerous action is detected, the server generates a warning.

[0305] This warning is transmitted to the user's mobile terminal or dedicated receiving device using a communication device. By receiving this warning, the user can quickly respond when danger is imminent. Furthermore, the response information based on the warning is recorded on the server and accumulated as learning information utilized for optimizing the action analysis algorithm.

[0306] As a specific example, consider a situation where a camera installed in a playground in a park is monitoring children playing on a swing. This system detects that a child is about to fall off the swing and identifies the action as dangerous. At this time, the server generates a warning saying, "The child is performing a dangerous movement on the swing. Please be careful," and immediately sends it to the parent's smartphone.

[0307] An example of the prompt text is: "This is a child safety monitoring system. Recognize children from specific camera footage and send an alert when a dangerous behavior is detected. Describe the specific behavior in the alert content and send a push notification to the smartphone."

[0308] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0309] Step 1:

[0310] The server receives visual data in real time from a network-enabled image acquisition device. The input is video data from the camera, and the server prepares this data to send to a face recognition algorithm.

[0311] Step 2:

[0312] The server executes a face recognition algorithm to identify a specific individual from video data. The input is the video data obtained in step 1, and the server analyzes the position and feature points of faces to obtain individual identification information as output. Technologies such as OpenCV and AWS Rekognition are utilized in this process.

[0313] Step 3:

[0314] The server analyzes the actions of the identified individual using a motion analysis device. The input is the individual identification information from step 2, and a motion analysis algorithm is applied to extract behavioral patterns, obtaining motion features as output.

[0315] Step 4:

[0316] The server predicts dangerous behavior from the obtained behavioral features. The input is the behavioral features from step 3, which are compared with pre-defined dangerous behavior patterns to evaluate the degree of danger, and the server generates a dangerous behavior detection result as output.

[0317] Step 5:

[0318] The server generates a warning when dangerous behavior is detected and sends it to the user's terminal via a communication device. The input is the detection result from step 4, which is used to construct the warning message and create notification information for the terminal as output. The user receives this information and can take action to avoid the crisis.

[0319] Step 6:

[0320] The response information based on the warnings is recorded on the server and stored as learning information used to optimize the behavioral analysis algorithm. The input is user response information, which is stored in the database and output as feedback data that will be useful for future improvements to the algorithm.

[0321] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0322] This invention is a system that combines an image acquisition device and an emotion recognition engine, and is intended to monitor the safety and emotional state of children in homes or public facilities. This system allows parents or facility managers to remotely check on a child's safety and psychological state even when they are not physically nearby.

[0323] The terminal collects real-time video data from image acquisition devices installed in homes or facilities. This data is then sent to a server, which applies a facial recognition algorithm to the received video to identify specific children who are being monitored.

[0324] Next, the server analyzes the behavioral patterns of the recognized individuals and uses an emotion recognition engine to analyze their emotional state. The emotion recognition engine analyzes facial expressions and body movements from the video data to determine whether the individuals are emotionally agitated or calm.

[0325] This information is linked to the prediction of risky behavior, and if high-risk behavior is detected, an alert is quickly generated. The results of emotion recognition are also used to adjust the priority and urgency of alerts. For example, when emotions are heightened, a high-urgency alert is issued.

[0326] The generated alerts are immediately sent to the user's mobile device or dedicated device. This allows the user to understand the detailed situation, including the emotional state of the person concerned, and take a quick and appropriate response.

[0327] For example, if a device is monitoring a playground in a park and the video footage shows a child becoming agitated on the play equipment, the server will perceive this as a dangerous situation and send an alert to the user indicating that the child is emotionally agitated. The user can then take action to go to the location and calm the child down.

[0328] Thus, by incorporating emotion recognition technology, this invention provides a system that achieves more comprehensive child safety management while also considering the maintenance of children's psychological health.

[0329] The following describes the processing flow.

[0330] Step 1:

[0331] The terminal activates the image acquisition device. This starts capturing video data in real time from the installed location. The collected video data is then ready to be sent to the server.

[0332] Step 2:

[0333] The server analyzes the received video data. First, it performs data preprocessing, such as noise reduction and brightness correction, to improve recognition accuracy.

[0334] Step 3:

[0335] The server uses a facial recognition algorithm to identify the face of a specific subject from the video data. The identified face is then compared against known faces registered in the database.

[0336] Step 4:

[0337] The server performs behavioral pattern analysis. It tracks the movements of identified individuals and determines whether their actions constitute dangerous behavior by comparing them with existing data.

[0338] Step 5:

[0339] The server activates an emotion recognition engine and evaluates the subject's emotional state based on their facial expressions and body movements. It determines whether the subject is emotionally agitated or calm, and passes that data to the next step.

[0340] Step 6:

[0341] The server integrates behavioral analysis results and emotion recognition results to determine the appropriate response if dangerous behavior is detected. In particular, if emotions are heightened, the urgency of the warning is increased, and an alert is generated immediately.

[0342] Step 7:

[0343] The server generates alerts and sends them to the user's mobile device or a dedicated device. The alerts include details of the behavior and emotional state, and are presented in a way that allows the user to respond quickly.

[0344] Step 8:

[0345] The user reviews the alert, understands its content, and takes appropriate action to ensure the safety of the person concerned. For example, this could involve temporarily monitoring the situation or physically going to the site to improve the situation.

[0346] Step 9:

[0347] The server records user responses and stores the corresponding actions as training data. This data will be used to improve the algorithm in the future.

[0348] (Example 2)

[0349] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0350] In recent years, the need for safety management in homes and public facilities has increased, but ensuring the safety of individuals requiring special care, such as young children, and appropriately monitoring their emotional states is not easy. Furthermore, conventional monitoring systems have difficulty detecting changes in emotions, posing a challenge in predicting dangerous situations in advance and responding quickly.

[0351] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0352] In this invention, the server includes means for collecting visual information from multiple locations in the environment, means for using a face recognition algorithm to identify the face of a target individual, and means for using an emotion evaluation engine to analyze the emotional state of the identified target individual. This makes it possible to identify and analyze the face and emotional state of a target individual with high accuracy and predict potential dangers.

[0353] An "image acquisition device" is a device used to collect visual information within an environment, and is capable of continuously acquiring information from multiple locations at high resolution.

[0354] "Visual information" refers to video and image data of the environment collected by image acquisition devices, and is used to identify people and other objects.

[0355] A "face recognition algorithm" is a computational method or program for detecting and identifying the face of a specific individual from visual information, and recognizes individuals based on specific feature points.

[0356] An "emotion evaluation engine" is a technological element that analyzes the emotional state of an identified individual from its facial expressions and movements, and identifies the type of emotion, such as excitement or calmness.

[0357] "Behavioral analysis methods" refer to algorithms and systems used to analyze the movements and past behavioral patterns of a target individual and predict abnormal or dangerous behaviors.

[0358] A "warning" is a notification generated when the level of danger increases based on emotional state and behavioral analysis, and it serves as a signal to quickly convey information to relevant parties.

[0359] A "portable device" is an electronic device intended to be carried by the user at all times and capable of receiving and transmitting information.

[0360] In this invention, the terminal first uses an image acquisition device to collect visual information from the environment. This image acquisition device is equipped with a high-resolution camera, is installed in homes and public facilities, and has the function of continuously recording video day and night. The collected visual information is efficiently transmitted to a server using compression technology.

[0361] Next, the server processes the received visual information. The technology used here is a face recognition algorithm, specifically utilizing libraries such as OpenCV and Dlib. This algorithm allows the server to accurately detect and identify the face of a specific individual. Based on the results, an emotion evaluation engine is activated to analyze the target individual's emotional state in real time. This engine evaluates emotions by referring to subtle changes in facial expressions and body movements, and in practice, it often utilizes APIs from general cloud service providers.

[0362] In addition, the behavioral analysis system evaluates the behavior of the target individual based on past data and accumulated training data. This analysis makes it possible to predict potential dangerous behaviors in advance and take countermeasures quickly. The generated warnings are notified to the user's mobile device in real time, allowing the user to quickly understand the situation and respond appropriately.

[0363] As a concrete example, suppose a terminal installed in a park is monitoring children near playground equipment, and the video captures a child appearing extremely excited on the equipment. The server analyzes this footage, and if it detects heightened emotions, it immediately sends an advanced warning to the user. In this case, an example of a prompt message might be "Detect a child who is excited on the playground equipment and generate a warning."

[0364] This allows users to grasp the specific situation on-site and take quick and appropriate action. This system further improves safety management by utilizing emotion recognition technology and highly accurate data analysis.

[0365] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0366] Step 1:

[0367] The terminal collects visual information from the environment using an image acquisition device. Specifically, it uses a high-resolution camera to capture continuous video data, which is then used as input. At this stage, the video data is sent to the server in a compressed format. The output is video data in a format that the server can process.

[0368] Step 2:

[0369] The server decodes the received compressed video data and applies a face recognition algorithm. The input is the decoded video frames, and the server uses image processing techniques to identify faces within each frame. The output of this operation is data showing the position of each face in each frame and its corresponding feature points.

[0370] Step 3:

[0371] The server activates an emotion evaluation engine based on the results of face recognition. This engine analyzes the input facial feature data and evaluates the emotional state based on changes and dynamics in facial expressions. The output of this process is an emotion score indicating the degree of excitement or calmness.

[0372] Step 4:

[0373] The server uses sentiment scores to perform behavioral analysis. Using past data and currently obtained sentiment results as input, the algorithm recognizes anomalies and abnormal patterns. The output of this process is the detection result of high-risk behavioral patterns.

[0374] Step 5:

[0375] The server generates warnings based on the results of behavioral analysis. The input is the risk assessment result, and the server uses this to generate warning messages and determine their priority. The warning messages are output, and the server is ready to notify the user.

[0376] Step 6:

[0377] The server sends the generated warning to the user's mobile device. The input consists of the warning message and the user's connection information, and the output is sent via a communication method such as Firebase Cloud Messaging. At this stage, the user receives information in real time, allowing them to quickly understand the situation.

[0378] (Application Example 2)

[0379] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0380] In modern homes and public facilities, there is a need for systems that can efficiently monitor the safety and psychological state of individuals remotely. However, conventional monitoring systems are limited to detecting physically dangerous behaviors and have difficulty adjusting for changes in psychological state and the resulting urgency. Therefore, in situations where a more comprehensive and rapid response is required, it is difficult to provide appropriate information to relevant parties.

[0381] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0382] In this invention, the server includes means for adjusting and generating the urgency of alerts based on the psychological state, means for determining the psychological state of the subject using emotion recognition technology, and means for recording response information based on the alerts and storing it as learning data for improving the behavioral analysis algorithm. This makes it possible to adjust alerts to reflect the psychological state of the subject and provide rapid response measures.

[0383] An "image acquisition device" is hardware used to collect video data from multiple locations within an environment.

[0384] A "face recognition algorithm" is a series of computational methods used to identify a person's face from video data.

[0385] A "behavioral pattern analysis means" is a processing means for analyzing the behavior of identified individuals and predicting risky behaviors.

[0386] "Emotion recognition technology" is a technology that analyzes facial expressions and body movements from video data in order to determine the psychological state of a subject.

[0387] An "alert" is a warning message generated based on detected risky behavior or psychological state, and notified to the relevant parties.

[0388] "Response measures tailored to the urgency of the situation" refers to specific measures and action guidelines provided according to the urgency of the alert.

[0389] "Training data" refers to data used to record response information based on alerts and to improve behavioral analysis algorithms.

[0390] In this invention, the entire system operates through the coordinated operation of various electronic devices. The system uses image acquisition devices installed in homes and public facilities to collect video data from multiple locations within the environment. A terminal receives this video data and transmits it to a server.

[0391] The server transfers video footage acquired by devices such as Raspberry Pi and Jetson Nano to a cloud server for image recognition and emotion recognition. The cloud server processes the acquired data using pre-trained face recognition algorithms and emotion recognition models based on TensorFlow and PyTorch. This allows for the identification of the subject's face and analysis of their behavior.

[0392] Furthermore, by applying emotion recognition technology and analyzing facial expressions and body movements obtained from the video, the psychological state of the subject is determined. This information is used to adjust the urgency of the alert, and alerts are generated according to the level of urgency.

[0393] Alerts are quickly sent to the user's mobile device or dedicated device. From this notification, the user can quickly grasp the details of the situation and go to the scene if necessary. For example, if a home robot detects unusual behavior or emotional changes around a child, the user will receive a message saying, "This child is showing a specific reaction and needs to be investigated."

[0394] Thus, by combining image acquisition technology and emotion recognition technology, this system enables more effective safety management and maintenance of psychological health.

[0395] A concrete example of a prompt to be input into the generating AI model might be: "Observe in real time what emotional state a child is in at home, and if any unusual behavior or emotion is detected, tell me what kind of notification to send to the parents based on that."

[0396] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0397] Step 1:

[0398] The terminal collects video data in real time from image acquisition devices installed within the environment. The terminal receives this video data as input and prepares it for transmission to the server. Specifically, it uses a camera device to capture surveillance video within a specified area. It also compresses the data as needed to improve transmission efficiency.

[0399] Step 2:

[0400] The server receives video data transmitted from the terminal. The server takes this data as input and applies a face recognition algorithm to identify the subject's face. Specifically, it detects the position of the face in each video frame and extracts the feature quantities of the identified face. Then, it identifies who the person is by comparing the feature quantities with a database.

[0401] Step 3:

[0402] The server analyzes the behavior of identified individuals and predicts risky behaviors. Using the analyzed behavioral data as input, it performs data calculations with behavioral pattern analysis tools and outputs behavioral anomalies and predicted risks. Specifically, it compares current behavioral data with past behavioral data to identify and analyze deviations from norms.

[0403] Step 4:

[0404] The server applies emotion recognition technology using facial information from video data to determine the subject's psychological state. In this step, the data after facial recognition is further analyzed, and the psychological state is inferred based on an emotion model. Emotions are quantified from facial expressions and subtle body movements, and these values are used to identify psychological heightened or calm states.

[0405] Step 5:

[0406] The server considers both risky behavior and psychological state to adjust and generate alert urgency. Using risk level and psychological data as input, the alert generation algorithm adjusts the urgency and content of the alert, outputting information tailored to the user. Specifically, if emotions are heightened, the alert priority is increased, and a warning including a message urging immediate action is prepared.

[0407] Step 6:

[0408] A pre-configured alert, generated on the server, is sent to the user's mobile device or dedicated device. The user receives this notification and, based on the entered information, can then go to the site and take action. This step utilizes push notification technology to ensure the alert content is quickly communicated.

[0409] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0410] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0411] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0412] [Third Embodiment]

[0413] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0414] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0415] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0416] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0417] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0418] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0419] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0420] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0421] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0422] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0423] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0424] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0425] This invention relates to a system that uses an image acquisition device to monitor the safety of children in homes and public facilities. This system contributes to ensuring the safety of children without requiring parents or facility managers to physically keep an eye on them.

[0426] In this system, image acquisition devices installed in various locations collect video data of the environment in real time. Terminals transmit this video data to a server, which uses a facial recognition algorithm to identify specific individuals. Through facial recognition, the server identifies the child to be monitored and continuously analyzes the child's behavior using behavioral pattern analysis tools.

[0427] The server rapidly analyzes situations, especially those with a high probability of dangerous behavior, and generates an alert if danger is detected. These alerts are sent in real time to parents and facility managers via smartphones or dedicated devices. This notification allows users to take swift action and ensure the child's safety.

[0428] As a concrete example, consider a scenario where a device monitors a playground area in a park. The server detects that a child is climbing onto a tall piece of playground equipment and is in an unstable position. The server immediately identifies this behavior as dangerous and sends an alert to the user's device saying, "Your child is climbing onto a tall piece of playground equipment. Caution is advised." Upon receiving this notification, the user can rush to the scene and take the necessary action.

[0429] Thus, this invention aims to ensure the safety of children and alleviate parental anxiety by utilizing cameras and AI technology. This makes it possible to efficiently manage children's safety even in busy daily life.

[0430] The following describes the processing flow.

[0431] Step 1:

[0432] The device activates the image acquisition device and prepares to collect real-time video data from within the home or public facility. During this process, the device adjusts the camera's image quality and sets the required frame rate and resolution.

[0433] Step 2:

[0434] The terminal sends the acquired video data to the server. The server receives this data and performs preprocessing for video processing. Preprocessing removes unwanted noise and creates visually accurate video data.

[0435] Step 3:

[0436] The server applies a facial recognition algorithm to the processed video data. This detects the faces of people in the video and matches them with known children in the database to identify specific individuals.

[0437] Step 4:

[0438] The server analyzes the identified individuals using behavioral pattern analysis tools. This analysis compares their actions with existing behavioral records, tracks their movements, and evaluates their behavioral status.

[0439] Step 5:

[0440] The server predicts risky behavior based on the results of behavioral analysis. It detects anomalies by comparing them with a predefined list of risky behaviors, such as unstable movements at high altitudes or approaching specific risk areas.

[0441] Step 6:

[0442] If the server detects dangerous behavior, it will generate an alert. This alert will include the specific nature of the danger and recommended actions.

[0443] Step 7:

[0444] The server generates alerts and sends them to the user. These alerts are sent in real time to the user's smartphone or dedicated device, allowing for immediate confirmation.

[0445] Step 8:

[0446] Users receive alerts and review their contents. By rushing to the scene as needed and taking action to confirm or improve safety, accidents can be prevented.

[0447] Step 9:

[0448] The server records user responses and actions taken in response to alerts. This data is used to improve future models and enhance the accuracy of AI algorithms.

[0449] (Example 1)

[0450] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0451] In modern society, ensuring the safety of people in homes and public facilities is a crucial issue. However, relying solely on physical surveillance has its limitations and may not be sufficient in situations requiring a quick and accurate response. Furthermore, conventional systems lack sufficient capabilities to predict and notify of dangers, making it difficult to adapt to individual circumstances.

[0452] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0453] In this invention, the server includes means for collecting visual information from multiple locations within the facility using image sensors, means for receiving the visual information and using an identification algorithm to identify the faces of individuals, and means for analyzing the behavior of the identified individuals and predicting potentially dangerous behaviors. This makes it possible to automatically and effectively monitor the safety of individuals within the facility and to quickly detect potential dangers.

[0454] An "image sensor" is a device used to collect visual information from the environment, and typically uses camera technology to acquire data in real time.

[0455] "Visual information" refers to records of the environment acquired in the form of images or videos, which allow us to confirm the situation of a specific place or object.

[0456] An "identification algorithm" is a computational method used to extract individual characteristics from collected visual information and identify specific individuals.

[0457] The term "individual" generally refers to a person or object that is identified by a system, and often specifically refers to a person who is the target of facial recognition.

[0458] "Motion analysis means" refers to functions that include technologies and processes for analyzing the motion patterns of identified individuals and detecting normal and abnormal behavior.

[0459] "Potentially dangerous behaviors" refer to actions that an individual may take that the system determines could potentially threaten safety.

[0460] A "warning" is a notification or alert issued when a potential crisis is detected, containing information to prompt relevant parties to take a swift action.

[0461] "Training data" refers to a dataset that is accumulated to improve the accuracy and performance of a system and is used for learning and improvement.

[0462] "Portable electronic devices" are small electronic devices primarily intended for use while on the go, and include smartphones and tablets.

[0463] A "dedicated device" refers to equipment designed specifically for a particular function or application, and is not necessarily used for general purposes.

[0464] This system is designed to monitor the safety of specific individuals using multiple image sensors installed in homes and public facilities. The entire system consists of three main elements: terminals, servers, and users.

[0465] The terminals collect visual information of the surrounding environment in real time through image sensors placed within the facility. This includes commonly used IP cameras and high-resolution webcams. These devices periodically capture data and transmit the collected visual information to a server.

[0466] The server uses an identification algorithm to identify individuals based on the visual information it receives. This process often employs Dlib or open-source AI frameworks and includes techniques for facial recognition. The server also has behavioral analysis capabilities, performing behavioral analysis using deep learning platforms such as TensorFlow and PyTorch. Based on the analyzed data, if potentially dangerous behavior is detected, the server immediately generates an alarm and adds it to the notification queue.

[0467] Users receive generated alerts via portable electronic devices or dedicated equipment. Utilizing push notification technology, important alerts are delivered to users in real time. This allows users to quickly understand the situation and take appropriate action.

[0468] As a concrete example, consider a scenario where a server monitors a playground area in a park. The server detects a child in an unstable position on the playground equipment, and if it determines the child's actions are dangerous, it issues an alert to the child's mobile device stating, "Your child is climbing on high playground equipment. Caution is advised." In this way, the system aims to enhance individual safety based on real-world scenarios.

[0469] Examples of prompts for a generative AI model include the following:

[0470] "Please generate a program that analyzes the position and movement of children in images and evaluates their safety."

[0471] "Please describe a system that monitors playground areas in parks and detects dangerous behaviors among children."

[0472] This system makes it possible to efficiently and effectively manage the safety of individuals within a specific facility.

[0473] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0474] Step 1:

[0475] The terminal captures real-time visual information using image sensors installed in the facility. This visual information includes video data of the environment, which is input from the image sensors. This data is captured at a specified frame rate and resolution and sent to a server for subsequent processing.

[0476] Step 2:

[0477] The server receives visual information transmitted from the terminal. Next, it applies a face recognition algorithm to identify the faces of individuals present in the visual information. The input data consists of images and video frames; the algorithm extracts feature points and outputs the identified face information. Specifically, it utilizes frameworks such as Dlib to determine the position of faces in real time.

[0478] Step 3:

[0479] The server analyzes the behavior of identified individuals using motion analysis tools. In this step, the server tracks the individual's movements and changes in position based on the facial information it has identified. Here, the input data is real-time motion data, and abnormal or potentially dangerous behaviors are detected through motion analysis, with the information output. Using an AI framework such as TensorFlow, behavioral patterns are analyzed using a deep learning model.

[0480] Step 4:

[0481] The server immediately generates an alarm if it detects potentially dangerous behavior based on the analysis results. Here, the input data is the result of the behavioral analysis, and an urgent notification is generated through the alarm generation routine, with its content output. Specifically, the alarm priority is set according to the detected level of danger, and it is added to the notification queue.

[0482] Step 5:

[0483] Users receive alarms distributed from the server via portable electronic devices or dedicated equipment. Input data consists of alarm information from the server, which users use to take prompt action. Output includes actions taken by the user to go to the scene and implement necessary countermeasures. Users are required to review the alarms and make appropriate decisions based on the information provided by the system.

[0484] (Application Example 1)

[0485] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0486] Efficiently monitoring children's safety in both home and public environments remains a challenge in today's busy lifestyle. Traditional monitoring methods struggle to detect and address dangers in real time, placing a heavy burden on parents and facility managers. This makes it difficult to prevent accidental injuries among children.

[0487] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0488] In this invention, the server includes means for collecting visual data from multiple points in the environment using an image acquisition device, means for using a facial recognition algorithm to identify individuals, and means for generating a warning and notifying relevant parties when dangerous behavior is detected. This enables relevant parties to efficiently monitor the child's behavior and respond quickly when danger is detected.

[0489] An "image acquisition device" is a device used to collect visual data from multiple points within an environment.

[0490] "Visual data" refers to information acquired in the form of images or videos.

[0491] A "face recognition algorithm" is a computational method for analyzing and identifying an individual's characteristics from received visual data.

[0492] "Individual" refers to a specific person who is being monitored.

[0493] A "motion analysis device" is a system that analyzes an individual's movements and predicts dangerous actions.

[0494] "Dangerous behavior" refers to actions that may threaten an individual's safety.

[0495] A "warning" is a notification that is generated when dangerous behavior is detected.

[0496] A "communication device" is a device or system used to transmit information to a terminal.

[0497] A "terminal" refers to a portable device or a dedicated receiving device, which is a device used to receive warnings.

[0498] "Learning information" refers to data that records response information based on warnings and is used to optimize algorithms.

[0499] To realize this invention, it is first necessary to install network-enabled image acquisition devices in homes and public facilities. These cameras collect visual data in real time from multiple points in the environment and transmit it to a cloud-based server. The server identifies specific individuals from the received visual data using a facial recognition algorithm (e.g., OpenCV or AWS Rekognition). The actions of the identified individuals are analyzed by a motion analysis device, and if a pre-defined dangerous action is detected, the server generates a warning.

[0500] This warning is transmitted via communication equipment to the user's mobile device or a dedicated receiver. Receiving this warning allows the user to respond quickly if danger is imminent. Furthermore, response information based on the warning is recorded on a server and stored as learning data used to optimize the motion analysis algorithm.

[0501] As a concrete example, consider a scenario where a camera installed in a park playground monitors children playing on a swing. This system detects when a child is about to fall off the swing and identifies the action as dangerous. At this point, the server generates a warning message saying, "Your child is making dangerous movements on the swing. Please be careful," and immediately sends it to the parent's smartphone.

[0502] An example of a prompt message is: "This is a child safety monitoring system. Identify children from specific camera footage and issue an alert if dangerous behavior is detected. The alert should include a description of the specific behavior and send a push notification to the smartphone."

[0503] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0504] Step 1:

[0505] The server receives visual data in real time from a network-enabled image acquisition device. The input is video data from the camera, and the server prepares this data to send to a face recognition algorithm.

[0506] Step 2:

[0507] The server executes a face recognition algorithm to identify a specific individual from video data. The input is the video data obtained in step 1, and the server analyzes the position and feature points of faces to obtain individual identification information as output. Technologies such as OpenCV and AWS Rekognition are utilized in this process.

[0508] Step 3:

[0509] The server analyzes the actions of the identified individual using a motion analysis device. The input is the individual identification information from step 2, and a motion analysis algorithm is applied to extract behavioral patterns, obtaining motion features as output.

[0510] Step 4:

[0511] The server predicts dangerous behavior from the obtained behavioral features. The input is the behavioral features from step 3, which are compared with pre-defined dangerous behavior patterns to evaluate the degree of danger, and the server generates a dangerous behavior detection result as output.

[0512] Step 5:

[0513] The server generates a warning when dangerous behavior is detected and sends it to the user's terminal via a communication device. The input is the detection result from step 4, which is used to construct the warning message and create notification information for the terminal as output. The user receives this information and can take action to avoid the crisis.

[0514] Step 6:

[0515] The response information based on the warnings is recorded on the server and stored as learning information used to optimize the behavioral analysis algorithm. The input is user response information, which is stored in the database and output as feedback data that will be useful for future improvements to the algorithm.

[0516] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0517] This invention is a system that combines an image acquisition device and an emotion recognition engine, and is intended to monitor the safety and emotional state of children in homes or public facilities. This system allows parents or facility managers to remotely check on a child's safety and psychological state even when they are not physically nearby.

[0518] The terminal collects real-time video data from image acquisition devices installed in homes or facilities. This data is then sent to a server, which applies a facial recognition algorithm to the received video to identify specific children who are being monitored.

[0519] Next, the server analyzes the behavioral patterns of the recognized individuals and uses an emotion recognition engine to analyze their emotional state. The emotion recognition engine analyzes facial expressions and body movements from the video data to determine whether the individuals are emotionally agitated or calm.

[0520] This information is linked to the prediction of risky behavior, and if high-risk behavior is detected, an alert is quickly generated. The results of emotion recognition are also used to adjust the priority and urgency of alerts. For example, when emotions are heightened, a high-urgency alert is issued.

[0521] The generated alerts are immediately sent to the user's mobile device or dedicated device. This allows the user to understand the detailed situation, including the emotional state of the person concerned, and take a quick and appropriate response.

[0522] For example, if a device is monitoring a playground in a park and the video footage shows a child becoming agitated on the play equipment, the server will perceive this as a dangerous situation and send an alert to the user indicating that the child is emotionally agitated. The user can then take action to go to the location and calm the child down.

[0523] Thus, by incorporating emotion recognition technology, this invention provides a system that achieves more comprehensive child safety management while also considering the maintenance of children's psychological health.

[0524] The following describes the processing flow.

[0525] Step 1:

[0526] The terminal activates the image acquisition device. This starts capturing video data in real time from the installed location. The collected video data is then ready to be sent to the server.

[0527] Step 2:

[0528] The server analyzes the received video data. First, it performs data preprocessing, such as noise reduction and brightness correction, to improve recognition accuracy.

[0529] Step 3:

[0530] The server uses a facial recognition algorithm to identify the face of a specific subject from the video data. The identified face is then compared against known faces registered in the database.

[0531] Step 4:

[0532] The server performs behavioral pattern analysis. It tracks the movements of identified individuals and determines whether their actions constitute dangerous behavior by comparing them with existing data.

[0533] Step 5:

[0534] The server activates an emotion recognition engine and evaluates the subject's emotional state based on their facial expressions and body movements. It determines whether the subject is emotionally agitated or calm, and passes that data to the next step.

[0535] Step 6:

[0536] The server integrates behavioral analysis results and emotion recognition results to determine the appropriate response if dangerous behavior is detected. In particular, if emotions are heightened, the urgency of the warning is increased, and an alert is generated immediately.

[0537] Step 7:

[0538] The server generates alerts and sends them to the user's mobile device or a dedicated device. The alerts include details of the behavior and emotional state, and are presented in a way that allows the user to respond quickly.

[0539] Step 8:

[0540] The user reviews the alert, understands its content, and takes appropriate action to ensure the safety of the person concerned. For example, this could involve temporarily monitoring the situation or physically going to the site to improve the situation.

[0541] Step 9:

[0542] The server records user responses and stores the corresponding actions as training data. This data will be used to improve the algorithm in the future.

[0543] (Example 2)

[0544] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0545] In recent years, the need for safety management in homes and public facilities has increased, but ensuring the safety of individuals requiring special care, such as young children, and appropriately monitoring their emotional states is not easy. Furthermore, conventional monitoring systems have difficulty detecting changes in emotions, posing a challenge in predicting dangerous situations in advance and responding quickly.

[0546] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0547] In this invention, the server includes means for collecting visual information from multiple locations in the environment, means for using a face recognition algorithm to identify the face of a target individual, and means for using an emotion evaluation engine to analyze the emotional state of the identified target individual. This makes it possible to identify and analyze the face and emotional state of a target individual with high accuracy and predict potential dangers.

[0548] An "image acquisition device" is a device used to collect visual information within an environment, and is capable of continuously acquiring information from multiple locations at high resolution.

[0549] "Visual information" refers to video and image data of the environment collected by image acquisition devices, and is used to identify people and other objects.

[0550] A "face recognition algorithm" is a computational method or program for detecting and identifying the face of a specific individual from visual information, and recognizes individuals based on specific feature points.

[0551] An "emotion evaluation engine" is a technological element that analyzes the emotional state of an identified individual from its facial expressions and movements, and identifies the type of emotion, such as excitement or calmness.

[0552] "Behavioral analysis methods" refer to algorithms and systems used to analyze the movements and past behavioral patterns of a target individual and predict abnormal or dangerous behaviors.

[0553] A "warning" is a notification generated when the level of danger increases based on emotional state and behavioral analysis, and it serves as a signal to quickly convey information to relevant parties.

[0554] A "portable device" is an electronic device intended to be carried by the user at all times and capable of receiving and transmitting information.

[0555] In this invention, the terminal first uses an image acquisition device to collect visual information from the environment. This image acquisition device is equipped with a high-resolution camera, is installed in homes and public facilities, and has the function of continuously recording video day and night. The collected visual information is efficiently transmitted to a server using compression technology.

[0556] Next, the server processes the received visual information. The technology used here is a face recognition algorithm, specifically utilizing libraries such as OpenCV and Dlib. This algorithm allows the server to accurately detect and identify the face of a specific individual. Based on the results, an emotion evaluation engine is activated to analyze the target individual's emotional state in real time. This engine evaluates emotions by referring to subtle changes in facial expressions and body movements, and in practice, it often utilizes APIs from general cloud service providers.

[0557] In addition, the behavioral analysis system evaluates the behavior of the target individual based on past data and accumulated training data. This analysis makes it possible to predict potential dangerous behaviors in advance and take countermeasures quickly. The generated warnings are notified to the user's mobile device in real time, allowing the user to quickly understand the situation and respond appropriately.

[0558] As a concrete example, suppose a terminal installed in a park is monitoring children near playground equipment, and the video captures a child appearing extremely excited on the equipment. The server analyzes this footage, and if it detects heightened emotions, it immediately sends an advanced warning to the user. In this case, an example of a prompt message might be "Detect a child who is excited on the playground equipment and generate a warning."

[0559] This allows users to grasp the specific situation on-site and take quick and appropriate action. This system further improves safety management by utilizing emotion recognition technology and highly accurate data analysis.

[0560] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0561] Step 1:

[0562] The terminal collects visual information from the environment using an image acquisition device. Specifically, it uses a high-resolution camera to capture continuous video data, which is then used as input. At this stage, the video data is sent to the server in a compressed format. The output is video data in a format that the server can process.

[0563] Step 2:

[0564] The server decodes the received compressed video data and applies a face recognition algorithm. The input is the decoded video frames, and the server uses image processing techniques to identify faces within each frame. The output of this operation is data showing the position of each face in each frame and its corresponding feature points.

[0565] Step 3:

[0566] The server activates an emotion evaluation engine based on the results of face recognition. This engine analyzes the input facial feature data and evaluates the emotional state based on changes and dynamics in facial expressions. The output of this process is an emotion score indicating the degree of excitement or calmness.

[0567] Step 4:

[0568] The server uses sentiment scores to perform behavioral analysis. Using past data and currently obtained sentiment results as input, the algorithm recognizes anomalies and abnormal patterns. The output of this process is the detection result of high-risk behavioral patterns.

[0569] Step 5:

[0570] The server generates warnings based on the results of behavioral analysis. The input is the risk assessment result, and the server uses this to generate warning messages and determine their priority. The warning messages are output, and the server is ready to notify the user.

[0571] Step 6:

[0572] The server sends the generated warning to the user's mobile device. The input consists of the warning message and the user's connection information, and the output is sent via a communication method such as Firebase Cloud Messaging. At this stage, the user receives information in real time, allowing them to quickly understand the situation.

[0573] (Application Example 2)

[0574] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0575] In modern homes and public facilities, there is a need for systems that can efficiently monitor the safety and psychological state of individuals remotely. However, conventional monitoring systems are limited to detecting physically dangerous behaviors and have difficulty adjusting for changes in psychological state and the resulting urgency. Therefore, in situations where a more comprehensive and rapid response is required, it is difficult to provide appropriate information to relevant parties.

[0576] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0577] In this invention, the server includes means for adjusting and generating the urgency of alerts based on the psychological state, means for determining the psychological state of the subject using emotion recognition technology, and means for recording response information based on the alerts and storing it as learning data for improving the behavioral analysis algorithm. This makes it possible to adjust alerts to reflect the psychological state of the subject and provide rapid response measures.

[0578] An "image acquisition device" is hardware used to collect video data from multiple locations within an environment.

[0579] A "face recognition algorithm" is a series of computational methods used to identify a person's face from video data.

[0580] A "behavioral pattern analysis means" is a processing means for analyzing the behavior of identified individuals and predicting risky behaviors.

[0581] "Emotion recognition technology" is a technology that analyzes facial expressions and body movements from video data in order to determine the psychological state of a subject.

[0582] An "alert" is a warning message generated based on detected risky behavior or psychological state, and notified to the relevant parties.

[0583] "Response measures tailored to the urgency of the situation" refers to specific measures and action guidelines provided according to the urgency of the alert.

[0584] "Training data" refers to data used to record response information based on alerts and to improve behavioral analysis algorithms.

[0585] In this invention, the entire system operates through the coordinated operation of various electronic devices. The system uses image acquisition devices installed in homes and public facilities to collect video data from multiple locations within the environment. A terminal receives this video data and transmits it to a server.

[0586] The server transfers video footage acquired by devices such as Raspberry Pi and Jetson Nano to a cloud server for image recognition and emotion recognition. The cloud server processes the acquired data using pre-trained face recognition algorithms and emotion recognition models based on TensorFlow and PyTorch. This allows for the identification of the subject's face and analysis of their behavior.

[0587] Furthermore, by applying emotion recognition technology and analyzing facial expressions and body movements obtained from the video, the psychological state of the subject is determined. This information is used to adjust the urgency of the alert, and alerts are generated according to the level of urgency.

[0588] Alerts are quickly sent to the user's mobile device or dedicated device. From this notification, the user can quickly grasp the details of the situation and go to the scene if necessary. For example, if a home robot detects unusual behavior or emotional changes around a child, the user will receive a message saying, "This child is showing a specific reaction and needs to be investigated."

[0589] Thus, by combining image acquisition technology and emotion recognition technology, this system enables more effective safety management and maintenance of psychological health.

[0590] A concrete example of a prompt to be input into the generating AI model might be: "Observe in real time what emotional state a child is in at home, and if any unusual behavior or emotion is detected, tell me what kind of notification to send to the parents based on that."

[0591] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0592] Step 1:

[0593] The terminal collects video data in real time from image acquisition devices installed within the environment. The terminal receives this video data as input and prepares it for transmission to the server. Specifically, it uses a camera device to capture surveillance video within a specified area. It also compresses the data as needed to improve transmission efficiency.

[0594] Step 2:

[0595] The server receives video data transmitted from the terminal. The server takes this data as input and applies a face recognition algorithm to identify the subject's face. Specifically, it detects the position of the face in each video frame and extracts the feature quantities of the identified face. Then, it identifies who the person is by comparing the feature quantities with a database.

[0596] Step 3:

[0597] The server analyzes the behavior of identified individuals and predicts risky behaviors. Using the analyzed behavioral data as input, it performs data calculations with behavioral pattern analysis tools and outputs behavioral anomalies and predicted risks. Specifically, it compares current behavioral data with past behavioral data to identify and analyze deviations from norms.

[0598] Step 4:

[0599] The server applies emotion recognition technology using facial information from video data to determine the subject's psychological state. In this step, the data after facial recognition is further analyzed, and the psychological state is inferred based on an emotion model. Emotions are quantified from facial expressions and subtle body movements, and these values are used to identify psychological heightened or calm states.

[0600] Step 5:

[0601] The server considers both risky behavior and psychological state to adjust and generate alert urgency. Using risk level and psychological data as input, the alert generation algorithm adjusts the urgency and content of the alert, outputting information tailored to the user. Specifically, if emotions are heightened, the alert priority is increased, and a warning including a message urging immediate action is prepared.

[0602] Step 6:

[0603] A pre-configured alert, generated on the server, is sent to the user's mobile device or dedicated device. The user receives this notification and, based on the entered information, can then go to the site and take action. This step utilizes push notification technology to ensure the alert content is quickly communicated.

[0604] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0605] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0606] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0607] [Fourth Embodiment]

[0608] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0609] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0610] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0611] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0612] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0613] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0614] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0615] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0616] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0617] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0618] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0619] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0620] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0621] This invention relates to a system that uses an image acquisition device to monitor the safety of children in homes and public facilities. This system contributes to ensuring the safety of children without requiring parents or facility managers to physically keep an eye on them.

[0622] In this system, image acquisition devices installed in various locations collect video data of the environment in real time. Terminals transmit this video data to a server, which uses a facial recognition algorithm to identify specific individuals. Through facial recognition, the server identifies the child to be monitored and continuously analyzes the child's behavior using behavioral pattern analysis tools.

[0623] The server rapidly analyzes situations, especially those with a high probability of dangerous behavior, and generates an alert if danger is detected. These alerts are sent in real time to parents and facility managers via smartphones or dedicated devices. This notification allows users to take swift action and ensure the child's safety.

[0624] As a concrete example, consider a scenario where a device monitors a playground area in a park. The server detects that a child is climbing onto a tall piece of playground equipment and is in an unstable position. The server immediately identifies this behavior as dangerous and sends an alert to the user's device saying, "Your child is climbing onto a tall piece of playground equipment. Caution is advised." Upon receiving this notification, the user can rush to the scene and take the necessary action.

[0625] Thus, this invention aims to ensure the safety of children and alleviate parental anxiety by utilizing cameras and AI technology. This makes it possible to efficiently manage children's safety even in busy daily life.

[0626] The following describes the processing flow.

[0627] Step 1:

[0628] The device activates the image acquisition device and prepares to collect real-time video data from within the home or public facility. During this process, the device adjusts the camera's image quality and sets the required frame rate and resolution.

[0629] Step 2:

[0630] The terminal sends the acquired video data to the server. The server receives this data and performs preprocessing for video processing. Preprocessing removes unwanted noise and creates visually accurate video data.

[0631] Step 3:

[0632] The server applies a facial recognition algorithm to the processed video data. This detects the faces of people in the video and matches them with known children in the database to identify specific individuals.

[0633] Step 4:

[0634] The server analyzes the identified individuals using behavioral pattern analysis tools. This analysis compares their actions with existing behavioral records, tracks their movements, and evaluates their behavioral status.

[0635] Step 5:

[0636] The server predicts risky behavior based on the results of behavioral analysis. It detects anomalies by comparing them with a predefined list of risky behaviors, such as unstable movements at high altitudes or approaching specific risk areas.

[0637] Step 6:

[0638] If the server detects dangerous behavior, it will generate an alert. This alert will include the specific nature of the danger and recommended actions.

[0639] Step 7:

[0640] The server generates alerts and sends them to the user. These alerts are sent in real time to the user's smartphone or dedicated device, allowing for immediate confirmation.

[0641] Step 8:

[0642] Users receive alerts and review their contents. By rushing to the scene as needed and taking action to confirm or improve safety, accidents can be prevented.

[0643] Step 9:

[0644] The server records user responses and actions taken in response to alerts. This data is used to improve future models and enhance the accuracy of AI algorithms.

[0645] (Example 1)

[0646] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0647] In modern society, ensuring the safety of people in homes and public facilities is a crucial issue. However, relying solely on physical surveillance has its limitations and may not be sufficient in situations requiring a quick and accurate response. Furthermore, conventional systems lack sufficient capabilities to predict and notify of dangers, making it difficult to adapt to individual circumstances.

[0648] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0649] In this invention, the server includes means for collecting visual information from multiple locations within the facility using image sensors, means for receiving the visual information and using an identification algorithm to identify the faces of individuals, and means for analyzing the behavior of the identified individuals and predicting potentially dangerous behaviors. This makes it possible to automatically and effectively monitor the safety of individuals within the facility and to quickly detect potential dangers.

[0650] An "image sensor" is a device used to collect visual information from the environment, and typically uses camera technology to acquire data in real time.

[0651] "Visual information" refers to records of the environment acquired in the form of images or videos, which allow us to confirm the situation of a specific place or object.

[0652] An "identification algorithm" is a computational method used to extract individual characteristics from collected visual information and identify specific individuals.

[0653] The term "individual" generally refers to a person or object that is identified by a system, and often specifically refers to a person who is the target of facial recognition.

[0654] "Motion analysis means" refers to functions that include technologies and processes for analyzing the motion patterns of identified individuals and detecting normal and abnormal behavior.

[0655] "Potentially dangerous behaviors" refer to actions that an individual may take that the system determines could potentially threaten safety.

[0656] A "warning" is a notification or alert issued when a potential crisis is detected, containing information to prompt relevant parties to take a swift action.

[0657] "Training data" refers to a dataset that is accumulated to improve the accuracy and performance of a system and is used for learning and improvement.

[0658] "Portable electronic devices" are small electronic devices primarily intended for use while on the go, and include smartphones and tablets.

[0659] A "dedicated device" refers to equipment designed specifically for a particular function or application, and is not necessarily used for general purposes.

[0660] This system is designed to monitor the safety of specific individuals using multiple image sensors installed in homes and public facilities. The entire system consists of three main elements: terminals, servers, and users.

[0661] The terminals collect visual information of the surrounding environment in real time through image sensors placed within the facility. This includes commonly used IP cameras and high-resolution webcams. These devices periodically capture data and transmit the collected visual information to a server.

[0662] The server uses an identification algorithm to identify individuals based on the visual information it receives. This process often employs Dlib or open-source AI frameworks and includes techniques for facial recognition. The server also has behavioral analysis capabilities, performing behavioral analysis using deep learning platforms such as TensorFlow and PyTorch. Based on the analyzed data, if potentially dangerous behavior is detected, the server immediately generates an alarm and adds it to the notification queue.

[0663] Users receive generated alerts via portable electronic devices or dedicated equipment. Utilizing push notification technology, important alerts are delivered to users in real time. This allows users to quickly understand the situation and take appropriate action.

[0664] As a concrete example, consider a scenario where a server monitors a playground area in a park. The server detects a child in an unstable position on the playground equipment, and if it determines the child's actions are dangerous, it issues an alert to the child's mobile device stating, "Your child is climbing on high playground equipment. Caution is advised." In this way, the system aims to enhance individual safety based on real-world scenarios.

[0665] Examples of prompts for a generative AI model include the following:

[0666] "Please generate a program that analyzes the position and movement of children in images and evaluates their safety."

[0667] "Please describe a system that monitors playground areas in parks and detects dangerous behaviors among children."

[0668] This system makes it possible to efficiently and effectively manage the safety of individuals within a specific facility.

[0669] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0670] Step 1:

[0671] The terminal captures real-time visual information using image sensors installed in the facility. This visual information includes video data of the environment, which is input from the image sensors. This data is captured at a specified frame rate and resolution and sent to a server for subsequent processing.

[0672] Step 2:

[0673] The server receives visual information transmitted from the terminal. Next, it applies a face recognition algorithm to identify the faces of individuals present in the visual information. The input data consists of images and video frames; the algorithm extracts feature points and outputs the identified face information. Specifically, it utilizes frameworks such as Dlib to determine the position of faces in real time.

[0674] Step 3:

[0675] The server analyzes the behavior of identified individuals using motion analysis tools. In this step, the server tracks the individual's movements and changes in position based on the facial information it has identified. Here, the input data is real-time motion data, and abnormal or potentially dangerous behaviors are detected through motion analysis, with the information output. Using an AI framework such as TensorFlow, behavioral patterns are analyzed using a deep learning model.

[0676] Step 4:

[0677] The server immediately generates an alarm if it detects potentially dangerous behavior based on the analysis results. Here, the input data is the result of the behavioral analysis, and an urgent notification is generated through the alarm generation routine, with its content output. Specifically, the alarm priority is set according to the detected level of danger, and it is added to the notification queue.

[0678] Step 5:

[0679] Users receive alarms distributed from the server via portable electronic devices or dedicated equipment. Input data consists of alarm information from the server, which users use to take prompt action. Output includes actions taken by the user to go to the scene and implement necessary countermeasures. Users are required to review the alarms and make appropriate decisions based on the information provided by the system.

[0680] (Application Example 1)

[0681] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0682] Efficiently monitoring children's safety in both home and public environments remains a challenge in today's busy lifestyle. Traditional monitoring methods struggle to detect and address dangers in real time, placing a heavy burden on parents and facility managers. This makes it difficult to prevent accidental injuries among children.

[0683] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0684] In this invention, the server includes means for collecting visual data from multiple points in the environment using an image acquisition device, means for using a facial recognition algorithm to identify individuals, and means for generating a warning and notifying relevant parties when dangerous behavior is detected. This enables relevant parties to efficiently monitor the child's behavior and respond quickly when danger is detected.

[0685] An "image acquisition device" is a device used to collect visual data from multiple points within an environment.

[0686] "Visual data" refers to information acquired in the form of images or videos.

[0687] A "face recognition algorithm" is a computational method for analyzing and identifying an individual's characteristics from received visual data.

[0688] "Individual" refers to a specific person who is being monitored.

[0689] A "motion analysis device" is a system that analyzes an individual's movements and predicts dangerous actions.

[0690] "Dangerous behavior" refers to actions that may threaten an individual's safety.

[0691] A "warning" is a notification that is generated when dangerous behavior is detected.

[0692] A "communication device" is a device or system used to transmit information to a terminal.

[0693] A "terminal" refers to a portable device or a dedicated receiving device, which is a device used to receive warnings.

[0694] "Learning information" refers to data that records response information based on warnings and is used to optimize algorithms.

[0695] To realize this invention, it is first necessary to install network-enabled image acquisition devices in homes and public facilities. These cameras collect visual data in real time from multiple points in the environment and transmit it to a cloud-based server. The server identifies specific individuals from the received visual data using a facial recognition algorithm (e.g., OpenCV or AWS Rekognition). The actions of the identified individuals are analyzed by a motion analysis device, and if a pre-defined dangerous action is detected, the server generates a warning.

[0696] This warning is transmitted via communication equipment to the user's mobile device or a dedicated receiver. Receiving this warning allows the user to respond quickly if danger is imminent. Furthermore, response information based on the warning is recorded on a server and stored as learning data used to optimize the motion analysis algorithm.

[0697] As a concrete example, consider a scenario where a camera installed in a park playground monitors children playing on a swing. This system detects when a child is about to fall off the swing and identifies the action as dangerous. At this point, the server generates a warning message saying, "Your child is making dangerous movements on the swing. Please be careful," and immediately sends it to the parent's smartphone.

[0698] An example of a prompt message is: "This is a child safety monitoring system. Identify children from specific camera footage and issue an alert if dangerous behavior is detected. The alert should include a description of the specific behavior and send a push notification to the smartphone."

[0699] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0700] Step 1:

[0701] The server receives visual data in real time from a network-enabled image acquisition device. The input is video data from the camera, and the server prepares this data to send to a face recognition algorithm.

[0702] Step 2:

[0703] The server executes a face recognition algorithm to identify a specific individual from video data. The input is the video data obtained in step 1, and the server analyzes the position and feature points of faces to obtain individual identification information as output. Technologies such as OpenCV and AWS Rekognition are utilized in this process.

[0704] Step 3:

[0705] The server analyzes the actions of the identified individual using a motion analysis device. The input is the individual identification information from step 2, and a motion analysis algorithm is applied to extract behavioral patterns, obtaining motion features as output.

[0706] Step 4:

[0707] The server predicts dangerous behavior from the obtained behavioral features. The input is the behavioral features from step 3, which are compared with pre-defined dangerous behavior patterns to evaluate the degree of danger, and the server generates a dangerous behavior detection result as output.

[0708] Step 5:

[0709] The server generates a warning when dangerous behavior is detected and sends it to the user's terminal via a communication device. The input is the detection result from step 4, which is used to construct the warning message and create notification information for the terminal as output. The user receives this information and can take action to avoid the crisis.

[0710] Step 6:

[0711] The response information based on the warnings is recorded on the server and stored as learning information used to optimize the behavioral analysis algorithm. The input is user response information, which is stored in the database and output as feedback data that will be useful for future improvements to the algorithm.

[0712] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0713] This invention is a system that combines an image acquisition device and an emotion recognition engine, and is intended to monitor the safety and emotional state of children in homes or public facilities. This system allows parents or facility managers to remotely check on a child's safety and psychological state even when they are not physically nearby.

[0714] The terminal collects real-time video data from image acquisition devices installed in homes or facilities. This data is then sent to a server, which applies a facial recognition algorithm to the received video to identify specific children who are being monitored.

[0715] Next, the server analyzes the behavioral patterns of the recognized individuals and uses an emotion recognition engine to analyze their emotional state. The emotion recognition engine analyzes facial expressions and body movements from the video data to determine whether the individuals are emotionally agitated or calm.

[0716] This information is linked to the prediction of risky behavior, and if high-risk behavior is detected, an alert is quickly generated. The results of emotion recognition are also used to adjust the priority and urgency of alerts. For example, when emotions are heightened, a high-urgency alert is issued.

[0717] The generated alerts are immediately sent to the user's mobile device or dedicated device. This allows the user to understand the detailed situation, including the emotional state of the person concerned, and take a quick and appropriate response.

[0718] For example, if a device is monitoring a playground in a park and the video footage shows a child becoming agitated on the play equipment, the server will perceive this as a dangerous situation and send an alert to the user indicating that the child is emotionally agitated. The user can then take action to go to the location and calm the child down.

[0719] Thus, by incorporating emotion recognition technology, this invention provides a system that achieves more comprehensive child safety management while also considering the maintenance of children's psychological health.

[0720] The following describes the processing flow.

[0721] Step 1:

[0722] The terminal activates the image acquisition device. This starts capturing video data in real time from the installed location. The collected video data is then ready to be sent to the server.

[0723] Step 2:

[0724] The server analyzes the received video data. First, it performs data preprocessing, such as noise reduction and brightness correction, to improve recognition accuracy.

[0725] Step 3:

[0726] The server uses a facial recognition algorithm to identify the face of a specific subject from the video data. The identified face is then compared against known faces registered in the database.

[0727] Step 4:

[0728] The server performs behavioral pattern analysis. It tracks the movements of identified individuals and determines whether their actions constitute dangerous behavior by comparing them with existing data.

[0729] Step 5:

[0730] The server activates an emotion recognition engine and evaluates the subject's emotional state based on their facial expressions and body movements. It determines whether the subject is emotionally agitated or calm, and passes that data to the next step.

[0731] Step 6:

[0732] The server integrates behavioral analysis results and emotion recognition results to determine the appropriate response if dangerous behavior is detected. In particular, if emotions are heightened, the urgency of the warning is increased, and an alert is generated immediately.

[0733] Step 7:

[0734] The server generates alerts and sends them to the user's mobile device or a dedicated device. The alerts include details of the behavior and emotional state, and are presented in a way that allows the user to respond quickly.

[0735] Step 8:

[0736] The user reviews the alert, understands its content, and takes appropriate action to ensure the safety of the person concerned. For example, this could involve temporarily monitoring the situation or physically going to the site to improve the situation.

[0737] Step 9:

[0738] The server records user responses and stores the corresponding actions as training data. This data will be used to improve the algorithm in the future.

[0739] (Example 2)

[0740] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0741] In recent years, the need for safety management in homes and public facilities has increased, but ensuring the safety of individuals requiring special care, such as young children, and appropriately monitoring their emotional states is not easy. Furthermore, conventional monitoring systems have difficulty detecting changes in emotions, posing a challenge in predicting dangerous situations in advance and responding quickly.

[0742] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0743] In this invention, the server includes means for collecting visual information from multiple locations in the environment, means for using a face recognition algorithm to identify the face of a target individual, and means for using an emotion evaluation engine to analyze the emotional state of the identified target individual. This makes it possible to identify and analyze the face and emotional state of a target individual with high accuracy and predict potential dangers.

[0744] An "image acquisition device" is a device used to collect visual information within an environment, and is capable of continuously acquiring information from multiple locations at high resolution.

[0745] "Visual information" refers to video and image data of the environment collected by image acquisition devices, and is used to identify people and other objects.

[0746] A "face recognition algorithm" is a computational method or program for detecting and identifying the face of a specific individual from visual information, and recognizes individuals based on specific feature points.

[0747] An "emotion evaluation engine" is a technological element that analyzes the emotional state of an identified individual from its facial expressions and movements, and identifies the type of emotion, such as excitement or calmness.

[0748] "Behavioral analysis methods" refer to algorithms and systems used to analyze the movements and past behavioral patterns of a target individual and predict abnormal or dangerous behaviors.

[0749] A "warning" is a notification generated when the level of danger increases based on emotional state and behavioral analysis, and it serves as a signal to quickly convey information to relevant parties.

[0750] A "portable device" is an electronic device intended to be carried by the user at all times and capable of receiving and transmitting information.

[0751] In this invention, the terminal first uses an image acquisition device to collect visual information from the environment. This image acquisition device is equipped with a high-resolution camera, is installed in homes and public facilities, and has the function of continuously recording video day and night. The collected visual information is efficiently transmitted to a server using compression technology.

[0752] Next, the server processes the received visual information. The technology used here is a face recognition algorithm, specifically utilizing libraries such as OpenCV and Dlib. This algorithm allows the server to accurately detect and identify the face of a specific individual. Based on the results, an emotion evaluation engine is activated to analyze the target individual's emotional state in real time. This engine evaluates emotions by referring to subtle changes in facial expressions and body movements, and in practice, it often utilizes APIs from general cloud service providers.

[0753] In addition, the behavioral analysis system evaluates the behavior of the target individual based on past data and accumulated training data. This analysis makes it possible to predict potential dangerous behaviors in advance and take countermeasures quickly. The generated warnings are notified to the user's mobile device in real time, allowing the user to quickly understand the situation and respond appropriately.

[0754] As a concrete example, suppose a terminal installed in a park is monitoring children near playground equipment, and the video captures a child appearing extremely excited on the equipment. The server analyzes this footage, and if it detects heightened emotions, it immediately sends an advanced warning to the user. In this case, an example of a prompt message might be "Detect a child who is excited on the playground equipment and generate a warning."

[0755] This allows users to grasp the specific situation on-site and take quick and appropriate action. This system further improves safety management by utilizing emotion recognition technology and highly accurate data analysis.

[0756] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0757] Step 1:

[0758] The terminal collects visual information from the environment using an image acquisition device. Specifically, it uses a high-resolution camera to capture continuous video data, which is then used as input. At this stage, the video data is sent to the server in a compressed format. The output is video data in a format that the server can process.

[0759] Step 2:

[0760] The server decodes the received compressed video data and applies a face recognition algorithm. The input is the decoded video frames, and the server uses image processing techniques to identify faces within each frame. The output of this operation is data showing the position of each face in each frame and its corresponding feature points.

[0761] Step 3:

[0762] The server activates an emotion evaluation engine based on the results of face recognition. This engine analyzes the input facial feature data and evaluates the emotional state based on changes and dynamics in facial expressions. The output of this process is an emotion score indicating the degree of excitement or calmness.

[0763] Step 4:

[0764] The server uses sentiment scores to perform behavioral analysis. Using past data and currently obtained sentiment results as input, the algorithm recognizes anomalies and abnormal patterns. The output of this process is the detection result of high-risk behavioral patterns.

[0765] Step 5:

[0766] The server generates warnings based on the results of behavioral analysis. The input is the risk assessment result, and the server uses this to generate warning messages and determine their priority. The warning messages are output, and the server is ready to notify the user.

[0767] Step 6:

[0768] The server sends the generated warning to the user's mobile device. The input consists of the warning message and the user's connection information, and the output is sent via a communication method such as Firebase Cloud Messaging. At this stage, the user receives information in real time, allowing them to quickly understand the situation.

[0769] (Application Example 2)

[0770] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0771] In modern homes and public facilities, there is a need for systems that can efficiently monitor the safety and psychological state of individuals remotely. However, conventional monitoring systems are limited to detecting physically dangerous behaviors and have difficulty adjusting for changes in psychological state and the resulting urgency. Therefore, in situations where a more comprehensive and rapid response is required, it is difficult to provide appropriate information to relevant parties.

[0772] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0773] In this invention, the server includes means for adjusting and generating the urgency of alerts based on the psychological state, means for determining the psychological state of the subject using emotion recognition technology, and means for recording response information based on the alerts and storing it as learning data for improving the behavioral analysis algorithm. This makes it possible to adjust alerts to reflect the psychological state of the subject and provide rapid response measures.

[0774] An "image acquisition device" is hardware used to collect video data from multiple locations within an environment.

[0775] A "face recognition algorithm" is a series of computational methods used to identify a person's face from video data.

[0776] A "behavioral pattern analysis means" is a processing means for analyzing the behavior of identified individuals and predicting risky behaviors.

[0777] "Emotion recognition technology" is a technology that analyzes facial expressions and body movements from video data in order to determine the psychological state of a subject.

[0778] An "alert" is a warning message generated based on detected risky behavior or psychological state, and notified to the relevant parties.

[0779] "Response measures tailored to the urgency of the situation" refers to specific measures and action guidelines provided according to the urgency of the alert.

[0780] "Training data" refers to data used to record response information based on alerts and to improve behavioral analysis algorithms.

[0781] In this invention, the entire system operates through the coordinated operation of various electronic devices. The system uses image acquisition devices installed in homes and public facilities to collect video data from multiple locations within the environment. A terminal receives this video data and transmits it to a server.

[0782] The server transfers video footage acquired by devices such as Raspberry Pi and Jetson Nano to a cloud server for image recognition and emotion recognition. The cloud server processes the acquired data using pre-trained face recognition algorithms and emotion recognition models based on TensorFlow and PyTorch. This allows for the identification of the subject's face and analysis of their behavior.

[0783] Furthermore, by applying emotion recognition technology and analyzing facial expressions and body movements obtained from the video, the psychological state of the subject is determined. This information is used to adjust the urgency of the alert, and alerts are generated according to the level of urgency.

[0784] Alerts are quickly sent to the user's mobile device or dedicated device. From this notification, the user can quickly grasp the details of the situation and go to the scene if necessary. For example, if a home robot detects unusual behavior or emotional changes around a child, the user will receive a message saying, "This child is showing a specific reaction and needs to be investigated."

[0785] Thus, by combining image acquisition technology and emotion recognition technology, this system enables more effective safety management and maintenance of psychological health.

[0786] A concrete example of a prompt to be input into the generating AI model might be: "Observe in real time what emotional state a child is in at home, and if any unusual behavior or emotion is detected, tell me what kind of notification to send to the parents based on that."

[0787] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0788] Step 1:

[0789] The terminal collects video data in real time from image acquisition devices installed within the environment. The terminal receives this video data as input and prepares it for transmission to the server. Specifically, it uses a camera device to capture surveillance video within a specified area. It also compresses the data as needed to improve transmission efficiency.

[0790] Step 2:

[0791] The server receives video data transmitted from the terminal. The server takes this data as input and applies a face recognition algorithm to identify the subject's face. Specifically, it detects the position of the face in each video frame and extracts the feature quantities of the identified face. Then, it identifies who the person is by comparing the feature quantities with a database.

[0792] Step 3:

[0793] The server analyzes the behavior of identified individuals and predicts risky behaviors. Using the analyzed behavioral data as input, it performs data calculations with behavioral pattern analysis tools and outputs behavioral anomalies and predicted risks. Specifically, it compares current behavioral data with past behavioral data to identify and analyze deviations from norms.

[0794] Step 4:

[0795] The server applies emotion recognition technology using facial information from video data to determine the subject's psychological state. In this step, the data after facial recognition is further analyzed, and the psychological state is inferred based on an emotion model. Emotions are quantified from facial expressions and subtle body movements, and these values are used to identify psychological heightened or calm states.

[0796] Step 5:

[0797] The server considers both risky behavior and psychological state to adjust and generate alert urgency. Using risk level and psychological data as input, the alert generation algorithm adjusts the urgency and content of the alert, outputting information tailored to the user. Specifically, if emotions are heightened, the alert priority is increased, and a warning including a message urging immediate action is prepared.

[0798] Step 6:

[0799] A pre-configured alert, generated on the server, is sent to the user's mobile device or dedicated device. The user receives this notification and, based on the entered information, can then go to the site and take action. This step utilizes push notification technology to ensure the alert content is quickly communicated.

[0800] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0801] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0802] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0803] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0804] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0805] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0806] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0807] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0808] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0809] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0810] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0811] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0812] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0813] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0814] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0815] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using this memory.

[0816] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0817] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0818] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0819] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0820] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0821] The following is further disclosed regarding the embodiments described above.

[0822] (Claim 1)

[0823] An image acquisition device provides a means for collecting video data from multiple locations within the environment,

[0824] A means for receiving the aforementioned video data and using a face recognition algorithm to identify the face of the subject,

[0825] A behavioral pattern analysis means for analyzing the behavior of the identified subject and predicting risky behavior,

[0826] A means for generating an alert and notifying relevant parties when the aforementioned dangerous behavior is detected,

[0827] A means for recording response information based on the aforementioned alert and accumulating it as training data for improving the behavioral analysis algorithm,

[0828] A system that includes this.

[0829] (Claim 2)

[0830] The system according to claim 1, wherein the image acquisition device is installed in a home or public facility and is configured to continuously monitor a specific subject.

[0831] (Claim 3)

[0832] The system according to claim 1, wherein the alert is configured to be delivered to the relevant parties via a mobile terminal or a dedicated device.

[0833] "Example 1"

[0834] (Claim 1)

[0835] A means of collecting visual information from multiple locations within the facility using an image sensor,

[0836] A means for receiving the aforementioned visual information and using an identification algorithm for identifying the faces of individuals,

[0837] A motion analysis means for analyzing the behavior of the identified individual and predicting potential dangerous behaviors,

[0838] A means for generating an alarm and notifying the user when the aforementioned potentially dangerous behavior is detected,

[0839] A means for recording response information based on the aforementioned alarm and accumulating it as training data to improve behavioral analysis methods,

[0840] A system that includes this.

[0841] (Claim 2)

[0842] The system according to claim 1, wherein the image sensor is installed in a house or facility and is configured to continuously monitor a specific individual.

[0843] (Claim 3)

[0844] The system according to claim 1, wherein the alarm is configured to be delivered to the user via a portable electronic device or a dedicated device.

[0845] "Application Example 1"

[0846] (Claim 1)

[0847] An image acquisition device provides a means for collecting visual data from multiple points in the environment,

[0848] A means for receiving the aforementioned visual data and using a facial recognition algorithm to identify an individual,

[0849] A behavioral analysis device for analyzing the actions of the identified individual and predicting dangerous actions,

[0850] A means for generating a warning and notifying relevant parties when the aforementioned dangerous operation is detected,

[0851] A means for recording response information based on the aforementioned warning and storing it as learning information for optimizing the motion analysis algorithm,

[0852] Means by which the aforementioned warning is notified to the terminal via a communication device,

[0853] A system that includes this.

[0854] (Claim 2)

[0855] The system according to claim 1, wherein the image acquisition device is installed in a building facility or public space and is configured to continuously monitor a specific individual.

[0856] (Claim 3)

[0857] The system according to claim 1, wherein the warning is configured to be transmitted to a mobile terminal or dedicated device via a communication device.

[0858] "Example 2 of combining an emotion engine"

[0859] (Claim 1)

[0860] An image acquisition device provides means for collecting visual information from multiple locations in the environment,

[0861] A means for receiving the aforementioned visual information and using a face recognition algorithm to identify the face of a target individual,

[0862] A means for using an emotion evaluation engine to analyze the emotional state of the identified target individual,

[0863] A behavioral analysis means for analyzing the aforementioned emotional state and behavior to predict high-risk behaviors,

[0864] A means for creating a warning and sending it to the relevant parties when the aforementioned high-risk behavior is detected,

[0865] A means for recording the response information based on the aforementioned warning and storing it as training data for optimizing the analysis algorithm,

[0866] A system that includes this.

[0867] (Claim 2)

[0868] The system according to claim 1, wherein the image acquisition device is installed in a home or public facility and is configured to continuously observe a specific target individual.

[0869] (Claim 3)

[0870] The system according to claim 1, wherein the warning is configured to be delivered to the relevant parties via a portable device or a dedicated device.

[0871] "Application example 2 when combining with an emotional engine"

[0872] (Claim 1)

[0873] An image acquisition device provides a means for collecting video data from multiple locations within the environment,

[0874] A means for receiving the aforementioned video data and using a face recognition algorithm to identify the face of the subject,

[0875] A behavioral pattern analysis means for analyzing the behavior of the identified subject and predicting risky behavior,

[0876] A means of determining the psychological state of a subject using emotion recognition technology,

[0877] A means for adjusting and generating the urgency of an alert based on the aforementioned psychological state,

[0878] A means for detecting the aforementioned risky behavior and notifying relevant parties of coordinated alerts,

[0879] A means for recording response information based on the aforementioned alert and accumulating it as training data for improving the behavioral analysis algorithm,

[0880] A system that includes this.

[0881] (Claim 2)

[0882] The system according to claim 1, wherein the image acquisition device is installed in a home or public facility and is configured to continuously monitor the psychological state of a subject.

[0883] (Claim 3)

[0884] The system according to claim 1, wherein the alert is distributed to the relevant parties via a mobile terminal or dedicated device, and is configured to provide information including countermeasures according to the urgency. [Explanation of Symbols]

[0885] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An image acquisition device provides a means for collecting video data from multiple locations within the environment, A means for receiving the aforementioned video data and using a face recognition algorithm to identify the face of the subject, A behavioral pattern analysis means for analyzing the behavior of the identified subject and predicting risky behavior, A means for generating an alert and notifying relevant parties when the aforementioned dangerous behavior is detected, A means for recording response information based on the aforementioned alert and accumulating it as training data for improving the behavioral analysis algorithm, A system that includes this.

2. The system according to claim 1, wherein the image acquisition device is installed in a home or public facility and is configured to continuously monitor a specific subject.

3. The system according to claim 1, wherein the alert is configured to be delivered to the relevant parties via a mobile terminal or a dedicated device.