Monitoring system, server, information processing method, monitored terminal, guardian terminal, and program
The monitoring system improves situational awareness by using a server-based machine learning model to categorize activities of monitored individuals, addressing privacy and battery issues in conventional systems.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- MIXI INC
- Filing Date
- 2025-05-22
- Publication Date
- 2026-07-01
AI Technical Summary
Conventional monitoring systems for individuals, such as children and the elderly, struggle to provide detailed information about their activities beyond location, leading to guardian anxiety and privacy concerns due to continuous voice data transmission and high battery consumption.
A monitoring system that includes a monitored terminal capturing audio and motion data, a server using machine learning to determine activity states, and a guardian terminal for notification, enhancing situational awareness with nuanced activity categorization.
Enables detailed understanding of monitored individuals' activities, reducing guardian anxiety and minimizing privacy and battery concerns through accurate activity determination and intelligent data processing.
Smart Images

Figure 2026109495000001_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to a monitoring system, a server, an information processing method, a monitored terminal, a guardian terminal, and a program.
Background Art
[0002] Conventionally, monitoring services that remotely grasp the current position of monitored persons such as children and the elderly using positioning technologies such as GPS (Global Positioning System) are known. In these services, it is common for guardians to be able to confirm the location of the monitored person on a map based on position information periodically transmitted from a dedicated terminal or smartphone carried by the monitored person.
[0003] In addition, in order to further enhance the safety of the monitored person, terminals equipped with a function of detecting a fall or an inactive state for a certain period of time or more using a motion sensor such as an acceleration sensor and notifying the guardian have also been proposed. Furthermore, a function of setting a specific area (geo fence) and notifying when the monitored person enters or exits the area is also widely used.
[0004] However, in the conventional monitoring based only on position information, although it is possible to grasp "where the monitored person is", it is difficult to grasp more detailed situations and activities such as "who the monitored person is with" and "specifically what the monitored person is doing". For example, even if it is known that a child is in a park, it is not possible to know whether the child is playing alone, playing with friends, or in contact with a stranger, or the content of the play (for example, an active play or sitting quietly doing something), and there have been many situations where guardians still feel anxious.
[0005] In addition, there are terminals that have a function of collecting and transmitting the voices around the monitored person, but constantly transmitting voice data has problems from the viewpoints of consideration for the privacy of the monitored person and people around, an increase in communication data volume, and battery consumption of the monitored terminal. [Prior art documents] [Patent Documents]
[0006] [Patent Document 1] U.S. Patent Application Publication No. 2015 / 0356848 [Overview of the Initiative] [Problems that the invention aims to solve]
[0007] This invention has been made in view of the problems of the prior art described above, and aims to improve the quality of monitoring. [Means for solving the problem]
[0008] To solve the above problems, a monitoring system according to one aspect of the present invention comprises: a monitored terminal equipped with a transmitting unit that acquires sensing data including audio information of the surroundings of the monitored person and motion information of the monitored person and transmits it to a server; the server equipped with a receiving unit that receives the sensing data transmitted from the monitored terminal, a determination unit that determines the current activity state of the monitored person from among a plurality of predefined activity categories using a machine learning model based on the audio information and motion information from the received sensing data, and a transmitting unit that transmits information indicating the determined activity state to a guardian terminal; and a guardian terminal equipped with a notification unit that receives the information transmitted from the server and notifies the guardian of information regarding the activity state of the monitored person. [Effects of the Invention]
[0009] According to the present invention, it is possible to understand the activities of the person being monitored in more detail and accurately than before. This can improve the quality of monitoring. [Brief explanation of the drawing]
[0010] [Figure 1]This is a block diagram showing the overall configuration of a monitoring system according to one embodiment of the present invention. [Figure 2] This is a block diagram showing an example of the hardware configuration of the monitored terminal according to this embodiment. [Figure 3] This block diagram shows an example of the server hardware configuration according to this embodiment. [Figure 4] This block diagram shows an example of the hardware configuration of the parental device according to this embodiment. [Figure 5] This figure shows an example of the functional block configuration of the server according to this embodiment. [Figure 6] This figure shows an example of the functional block configuration of the monitored terminal according to this embodiment. [Figure 7] This figure shows an example of the functional block configuration of the parental device according to this embodiment. [Figure 8] This flowchart outlines the activity status determination process in this embodiment. [Figure 9] This is a data structure diagram showing an example of sensing data in this embodiment. [Figure 10] This figure shows an example of the display screen of the parental device in this embodiment. [Figure 11] This is a sequence diagram of the voice information transmission process based on a trigger event in this embodiment. [Figure 12] This is a flowchart of the server's process for determining the status of accompanying persons in this embodiment. [Figure 13] This is a conceptual diagram of the complex conditional warning generation process on the server in this embodiment. [Figure 14] This is a conceptual diagram of the learning and updating process of the machine learning model in this embodiment. [Figure 15] This figure shows another example of the display screen on the parent terminal in this embodiment (risk level visualization display). [Figure 16] This figure shows another example of the display screen on the parent terminal in this embodiment (privacy-conscious display settings). [Figure 17]It is a diagram showing another example (prediction / omen display) of the display screen of the guardian terminal in this embodiment. [Figure 18] It is a diagram showing another example (communication cooperation UI element) of the display screen of the guardian terminal in this embodiment. [Figure 19] It is a diagram showing another example (feedback collection UI element) of the display screen of the guardian terminal in this embodiment.
Mode for Carrying Out the Invention
[0011] Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In each figure, the same or corresponding components are denoted by the same reference numerals, and redundant descriptions are omitted as appropriate.
[0012] (Overview of the entire system) FIG. 1 is a block diagram showing the overall configuration of a monitoring system 1 according to an embodiment of the present invention. The monitoring system 1 mainly includes a monitored terminal 10 carried by a monitored person MP (for example, a child or an elderly person), a server 20 communicably connected to the monitored terminal 10 and a guardian terminal 30 via a network NW (for example, the Internet, a mobile phone network, etc.), and a guardian terminal 30 used by a guardian PA (for example, a relative or caregiver of the monitored person MP).
[0013] In this specification, "server 20" is not limited to a single physical computer device, but can also include a group of multiple physical or virtual computer devices that cooperate with each other via a network (NW) and collectively provide the server functions described later (such as receiving sensing data, determining activity status, and transmitting determination results), multiple functional modules in a cloud computing environment, or a distributed system based on a microservices architecture. For example, a front-end function that receives and processes sensing data, an analysis function that runs a machine learning model to determine activity status, and a function that manages notifications to parental devices may each be implemented as different server components or services and operate in cooperation with each other, and these may also be considered "server 20" as defined in this specification.
[0014] The monitored terminal 10 has the function of acquiring various sensing data, including audio information and movement information around the monitored person MP, and transmitting this data to the server 20. Based on the sensing data received from the monitored terminal 10, the server 20 uses a machine learning model or the like to determine the current activity status of the monitored person MP and notifies the guardian terminal 30 of the information including the determination result. Based on the information received from the server 20, the guardian terminal 30 notifies the guardian PA of the activity status of the monitored person MP, etc. (for example, through display, audio notification, vibration notification, etc.).
[0015] (Description of hardware configuration) Next, we will describe examples of the hardware configurations of each device according to this embodiment. Figure 2 is a block diagram showing an example of the hardware configuration of the monitored terminal 10. The monitored terminal 10 can be configured as, for example, a smartphone, a smartwatch, or a dedicated monitoring device. The monitored terminal 10 includes a CPU (Central Processing Unit) 11, ROM (Read Only Memory) 12, RAM (Random Access Memory) 13, a storage unit 14, a communication unit 15, an audio input unit 16 (microphone, etc.), an motion sensor unit 17 (including an IMU: Inertial Measurement Unit, such as an accelerometer and gyroscope), a GPS receiver unit 18, and a bus 19 connecting these units. The CPU 11 controls the operation of the entire monitored terminal 10 by executing programs stored in the ROM 12 and the storage unit 14. The RAM 13 functions as the work area for the CPU 11. The storage unit 14 stores the OS (Operating System), various application programs, data, etc. The communication unit 15 performs wireless communication with a server 20, etc. via a network NW. The audio input unit 16 collects ambient sounds. The motion sensor unit 17 detects the movement and orientation of the terminal. The GPS receiver unit 18 receives signals from GPS satellites and determines the current location of the terminal.
[0016] Figure 3 is a block diagram showing an example of the hardware configuration of server 20. Server 20 is composed of, for example, one or more computers. Server 20 includes a CPU 21, ROM 22, RAM 23, storage unit 24, communication unit 25, input unit 26, output unit 27, and a bus 28 connecting these units. The CPU 21 controls the operation of the entire server 20 by executing programs stored in the ROM 22 and storage unit 24. The RAM 23 functions as the work area for the CPU 21. The storage unit 24 stores the OS, various programs, and databases (e.g., machine learning models, user information, judgment result logs, etc.). The communication unit 25 communicates with the monitored terminal 10 and the guardian terminal 30, etc., via the network NW. The input unit 26 is a keyboard or mouse, etc., and accepts operation input from an administrator, etc. The output unit 27 is a display, etc., and displays various information.
[0017] Figure 4 is a block diagram showing an example of the hardware configuration of the parent terminal 30. The parent terminal 30 can be configured as, for example, a smartphone, a tablet, or a PC (Personal Computer). The parent terminal 30 includes a CPU 31, ROM 32, RAM 33, storage unit 34, communication unit 35, input unit 36 (touch panel, keyboard, microphone, etc.), output unit 37 (display, speaker, etc.), and a bus 38 connecting these units. The CPU 31 controls the operation of the entire parent terminal 30 by executing programs stored in the ROM 32 and storage unit 14. The RAM 33 functions as a work area for the CPU 31. The storage unit 34 stores the OS, dedicated application programs, data, etc. The communication unit 35 communicates with the server 20, etc. via the network NW. The input unit 36 receives operation input from the parent PA. The output unit 37 displays information such as the activity status of the person being monitored MP, or provides voice notifications.
[0018] (Explanation of functional block configuration) Next, examples of the functional block configurations for each device according to this embodiment will be described. These functional blocks are mainly realized by the CPU of each device executing a predetermined program.
[0019] Figure 5 shows an example of the main functional block configuration of the server 20. The server 20 includes a communication control unit 201, a data receiving unit 202, a sensing data analysis unit 203, an activity status determination unit 204, a companion status determination unit 205 (optional), a machine learning model storage unit 206, a determination result generation unit 207, a notification information transmission unit 208, a user information management unit 209, a data storage and learning unit 210 (optional), a privacy processing unit 211 (optional), and a log recording unit 212 (optional, not shown in Figure 5).
[0020] The communication control unit 201 controls all communication between the monitored terminal 10 and the guardian terminal 30 via the network NW. The data receiving unit 202 receives sensing data transmitted from the monitored terminal 10 (corresponding to the "receiving unit" in Appendix 1). The sensing data analysis unit 203 extracts features necessary for determining the activity state from the received sensing data (especially voice information and motion information), and performs data verification and preprocessing. In this specification, "voice information" transmitted from the monitored terminal 10 to the server 20 means data that holds acoustic features that can substantially contribute as the basis for determination when the activity state determination unit 204 of the server 20 uses a machine learning model to determine the activity state of the monitored MP (especially activity categories where acoustic features are important, such as "conversation state" or "play state"). Specifically, this may include voice waveform data of a predetermined time length (compressed or uncompressed), and a set of multiple frequency domain or time domain acoustic features extracted from the voice waveform data (e.g., MFCC, spectrogram, voice energy, pitch information, etc.). Simply providing flag information indicating the presence or absence of sound, or information indicating the occurrence of extremely limited types of acoustic events, may not be sufficient as "sound information" for the server's machine learning model to determine multi-category activity states. Similarly, "motion information" refers to data that possesses kinematic features that can substantially contribute as the basis for the determination of the activity state of the monitored MP (especially for activity categories where the pattern, intensity, and type of physical movement are important) when the activity state determination unit 204 of the server 20 uses a machine learning model to determine the activity state. Specifically, this could include time-series data of acceleration and angular velocity acquired from the inertial measurement unit (raw data or data at an appropriate sampling rate), or a set of multiple kinematic features extracted from such time-series data (e.g., statistical features, frequency features, gait-related parameters, posture information, etc.). Simply providing binary information such as "moved" or "stopped," or information indicating the occurrence of extremely limited types of motion events, may not be sufficient as "motion information" for the server's machine learning model to determine multi-category activity states.
[0021] The activity state determination unit 204 determines the current activity state of the monitored MP from among a number of predefined activity categories, based on the voice information and motion information processed by the sensing data analysis unit 203, using a machine learning model stored in the machine learning model storage unit 206 (corresponding to the "determination unit" in Appendix 1). In this specification, when the activity state determination unit 204 determines the current activity state of the monitored MP "based on voice information and motion information," it means that in the process of identifying at least a portion of the multiple activity categories to be determined (especially nuanced activity categories that are difficult to identify using only one of the pieces of information, such as "playing state," "learning state," and "conversation state"), both the features extracted from the voice information and the features extracted from the motion information are used as input to the machine learning model or substantially considered in the determination logic in such a way that they each influence the determination result of the activity state. Therefore, if either piece of information is not referenced at all, or is used only in a minor way that does not substantially affect the determination result, it shall not be considered "based on voice information and motion information" as used herein. For example, in determining a certain activity category, if motion information is the primary clue and voice information is used to increase the certainty or to avoid confusion with other activity categories, or conversely, if voice information is the primary clue and motion information is used to avoid confusion, the activity status is considered to be determined based on both types of information overall. Furthermore, even if the voice information transmitted from the monitored terminal 10 is transmitted intermittently based on a predetermined trigger event, if the server 20 substantially uses the received voice information in the activity status determination process, it may be considered that the activity status is being determined "based on voice information and motion information."
[0022] Furthermore, the "determination" of the activity status by the activity status determination unit 204 means the process of selecting or identifying one activity category from among several predefined activity categories that is judged to best match the current activity status of the monitored MP, based on the input voice information and action information (and optionally referenced context information), and taking into account the probability and confidence score for each activity category output by the machine learning model. This determination result is output, for example, as an identifier (ID), name (text label), or code indicating a specific activity category.
[0023] In this specification, the "multiple predefined activity categories" determined by the activity status determination unit 204 refer to types of activities that have been pre-set or learned at the system level as meaningful units from among the actions and states of the person being monitored (especially a child) in their daily life, in order to help the guardian PA understand the situation and make decisions regarding monitoring. This includes, for example, the "playing state," "learning state," "conversation state," "eating state," "moving state," and "stationary state" exemplified in the subordinate appendix (corresponding to appendix 5), but is not limited to these, and these categories may be added, changed, or subdivided as appropriate depending on the purpose of the system's operation and the characteristics of the person being monitored. Importantly, at some point in time, the machine learning model of the activity status determination unit 204 classifies or identifies the input sensing data into one of these "predefined" categories, and the result forms the basis of the "information indicating the determined activity state" that is notified to the guardian terminal 30.
[0024] The machine learning model used by the activity state determination unit 204 is particularly sophisticated to accurately identify the unique behavioral and vocal patterns of children when the monitored MP is a child. A child's "play state" is not simply characterized by high activity levels, but also by diverse actions occurring in an unpredictable sequence, and the volume and frequency characteristics of the sounds they emit often change rapidly. Furthermore, distinguishing between a "learning state" and a "static state (such as resting)," and identifying the person, content, and emotional nuances in a "conversation state," presents a technical challenge: general-purpose activity recognition models for adults cannot achieve sufficient accuracy. To overcome this challenge, the activity state determination unit 204 of this embodiment actively extracts and utilizes features from the audio information, such as MFCC, prosodic information (pitch fluctuation patterns, intonation magnitude, speech rhythm, etc.), voice quality features (e.g., jitter, shimmer), and specific acoustic features used in emotion recognition AI (e.g., energy distribution in specific frequency bands of the speech spectrum, sound rise and fall characteristics, etc.). Similarly, motion information can be used to identify not only simple activity levels and step counts, but also unpredictable, non-periodic motion sequences like those seen in children's play, sustained patterns of subtle hand movements and postures during learning, or unique physical movement patterns when using specific play equipment or participating in sports. This is achieved by applying time-series analysis of inertial sensor data (e.g., pattern matching such as Dynamic Time Warping, or sequence modeling using recurrent neural networks (LSTM, GRU)).
[0025] Furthermore, to effectively integrate these extracted speech and action features, it is particularly effective to employ a multimodal neural network with an attention mechanism (for example, an architecture consisting of encoders applied to the speech stream and action stream respectively, and decoders that integrate their outputs and weight important features through attention, or a multimodal Transformer, etc.). This attention mechanism makes it possible, for example, to suppress ambient noise components and selectively focus on the speech segments of the person being monitored and their conversation partner when determining a "conversation state," or to emphasize the temporal relationship between specific action patterns and cheers or unique sounds that occur in sync with them when determining a "play state."
[0026] In the early stages of development, activity recognition models for adults and general-purpose machine learning models using simple features frequently misrecognized activities between children, such as "playing tag with friends (play state)" and "sprinting alone (movement state)," or "learning using a tablet (learning state)" and "watching videos (play that is close to a stationary state)." However, by adopting feature engineering and a model architecture specifically tailored to the behavioral characteristics of children and multimodal information processing, as well as a learning strategy using a dedicated dataset described later, we were able to achieve a sufficiently high level of identification accuracy for these difficult-to-distinguish activity categories compared to conventional technologies (for example, an improvement of more than 10% in the F-score for certain important categories). During this determination process, current location information, time information, and past activity history may be optionally referenced (see Appendix 6).
[0027] The companion status determination unit 205 is an optional functional unit that determines the companion status of the monitored MP (for example, whether the surrounding terminal is a pre-registered terminal, the number of registered terminals and unknown terminals, etc., as per Appendix 8) based on the identifier of the surrounding terminal and current location information (if any, as per Appendix 7) transmitted from the monitored terminal 10. Furthermore, it is also possible to analyze past encounter patterns (as per Appendix 9). Details of this companion status determination process will be described later with reference to Figure 12.
[0028] The machine learning model memory unit 206 stores machine learning models used for activity state determination and companion status determination. In this specification, the "machine learning model" used by the activity state determination unit 204 refers to a model that has statistical or algorithmic judgment criteria acquired by learning from data the relationship between input data (features extracted mainly from speech information and motion information in this embodiment) and their corresponding correct labels (activity categories), and has the function of classifying or determining activity categories for unknown input data. This may include, but is not limited to, deep learning models (e.g., convolutional neural networks (CNNs), recurrent neural networks (RNNs) such as LSTM and GRU, Transformer-based models, etc., as per Appendix 14), support vector machines, decision trees (including gradient boosting decision trees, etc.), and ensemble learning models. Systems consisting only of fixed threshold processing or rule-based systems in which all branching conditions have been manually described in advance are, in principle, not included in the scope of "machine learning models" as used herein. However, a hybrid approach combining the output of a machine learning model and a rule-based system is not prohibited. The machine learning models stored in the machine learning model memory unit 206 employ feature engineering and model architecture that take into account the behavioral characteristics of children as described above (e.g., introduction of an attention mechanism, application of a custom loss function to improve the discrimination performance between specific activity categories, or ensemble learning that integrates the outputs of multiple different models), thereby achieving discrimination performance that would be difficult to achieve with mere combinations of sensor data. In the early stages of development, general-purpose activity recognition models often misrecognized the activities of specific children (e.g., distinguishing between playing on specific playground equipment and simply running around), but these challenges were overcome by the specialized model configuration and learning strategy as in this embodiment. Furthermore, in training the machine learning model, a large-scale, high-quality dedicated dataset consisting of behavioral data and audio data of children of various ages, genders, and personalities in diverse environments (indoors, outdoors, quiet, noisy, etc.) is constructed, and the model is trained or fine-tuned using this dataset.In this case, to reduce confusion between specific activity categories, it is also effective to introduce Hard Negative Mining for activity pairs that are difficult to distinguish, or a custom loss function that emphasizes and learns their differences (e.g., a variation of Focal Loss or a metric learning method that considers the distance between categories). Furthermore, a hierarchical decision structure that combines a first model that first determines a rough activity level (e.g., high activity / low activity, speech present / absent) with a second group of models that determine a more detailed activity category based on the result, or an ensemble learning method that integrates the outputs of multiple models with different characteristics, is also preferable for improving overall decision accuracy and robustness. This machine learning model can be continuously learned or updated by the data storage and learning unit 210 described later (corresponding to Appendix 11). The concept of this learning and updating process will be described later with reference to Figure 14.
[0029] The determination result generation unit 207 integrates the activity status determined by the activity status determination unit 204, the companion status determined optionally by the companion status determination unit 205, and the current location information (if any) received from the monitored terminal 10, and generates information for notification to the guardian terminal 30. In this specification, the "information indicating the determined activity status" generated by the determination result generation unit 207 and transmitted by the notification information transmission unit 208 to the guardian terminal 30 is information that directly or indirectly indicates the specific activity category determined (selected or identified) by the activity status determination unit 204. For example, this includes the identification code assigned to the activity category, the name of the activity category (e.g., "playing state", "learning state"), or information that allows the guardian terminal 30 to uniquely interpret the activity category.
[0030] It is particularly important for the judgment result generation unit 207 to comprehensively evaluate the activity status determined by the activity status determination unit 204, the companion status determined optionally by the companion status determination unit 205, and the current location information received from the monitored terminal 10. For example, even if the "conversation status" is the same, the meaning and urgency of the information to be conveyed to the guardian will differ greatly depending on whether the situation is "at home with a registered friend" or "in a park at night with an unknown terminal." By interpreting this information in combination, the server 20 can generate a detailed situational context that cannot be obtained from individual pieces of information, and provide the guardian terminal 30 with more situation-appropriate and detailed notifications (including everything from simple information provision to warnings and high-urgency alerts). This dramatic improvement in the quality of situational recognition through the combination of multiple pieces of information is one of the remarkable effects of the present invention. In this case, it is also possible to combine multiple pieces of information to determine whether they meet specific warning conditions and generate a warning notification (corresponding to Appendix 10). The concept of this combined conditional warning generation process will be described later with reference to Figure 13. The notification information transmission unit 208 transmits the information generated by the judgment result generation unit 207 to the parent terminal 30 (corresponding to the "transmission unit" in Appendix 1).
[0031] In this embodiment, determining the activity status of the monitored MP on the server 20 side has several important technical advantages. Firstly, the server 20 generally has abundant computing resources (CPU power, memory capacity, etc.) compared to mobile devices such as the monitored terminal 10 and the guardian terminal 30. This enables the execution of more complex and large-scale machine learning models (for example, deep learning models as referred to in this specification, or multimodal fusion models that highly integrate and process information from multiple sensors), allowing for the identification of the diverse and nuanced activity status unique to children with higher accuracy and detail. Secondly, the server 20 can centrally manage and analyze diverse sensing data that may be collected from multiple monitored terminals 10, as well as long-term behavioral history data, and utilize this for the continuous learning and improvement of machine learning models (see Figure 14) and the construction of personalized judgment logic based on broader knowledge. If the server 20 simply transfers sensing data to the guardian terminal 30, and the guardian terminal 30 performs the main activity status determination processing, resource constraints on the guardian terminal 30 (battery consumption, processing power, installable model size, etc.) would make it difficult to apply advanced machine learning models like those executed by the server 20 of the present invention, or to continuously improve the model using data from multiple monitored individuals. As a result, the quality of the activity status information obtained (diversity of identifiable categories, determination accuracy, reliability, etc.) may be significantly reduced. The present invention solves this problem by having the server 20 perform this main intelligent processing, "determination," thereby achieving higher quality monitoring. Furthermore, the "information indicating the determined activity status" transmitted by the notification information transmission unit 208 to the guardian terminal 30 is the result of intelligent processing (i.e., "determination") by the activity status determination unit 204, and is clearly distinguishable from the sensing data itself received from the monitored terminal 10 (e.g., raw audio waveform data, raw data from the accelerometer, or basic feature vectors). Specifically, as mentioned above, this includes identification codes, names, or structured data that allow the parent device 30 to uniquely interpret that activity category.As a result, the parent terminal 30 can directly recognize specific activity states given meaning by the server 20 and notify the parent PA without bearing the burden of analyzing and determining complex sensing data itself.
[0032] The User Information Management Unit 209 manages information related to the monitored MP and the guardian PA (e.g., terminal ID, registration information, settings information). The Data Storage and Learning Unit 210 stores received sensing data and judgment results and uses them for training, updating, and personalization of machine learning models. The Log Recording Unit 212 (not shown in Figure 5, optional) records log data in the Storage Unit 24 when the Activity Status Determination Unit 204 determines the activity status, including the determination result (identified activity category), the main input data used for the determination (or its characteristics), the determination time, and the identification information of the monitored terminal 10. This log data can be used for verifying system operation, evaluating and improving judgment accuracy, or responding to inquiries from guardians, and can also be used as indirect evidence that the server 20 actually determined the activity status and transmitted the result, if necessary. The Privacy Processing Unit 211 performs anonymization or deletion processing on received sensing data (especially voice information and motion information) within a predetermined period after determining the activity status (corresponding to Appendix 12).
[0033] Figure 6 shows an example of the main functional block configuration of the monitored terminal 10. The monitored terminal 10 includes a sensor control unit 101, an audio information acquisition unit 102, an operation information acquisition unit 103, a location information acquisition unit 104 (optional), a peripheral terminal information acquisition unit 105 (optional), a sensing data generation unit 106, a data transmission unit 107, and a trigger event detection unit 108 (optional).
[0034] The sensor control unit 101 controls sensor devices such as the voice input unit 16, motion sensor unit 17, and GPS receiver unit 18. The voice information acquisition unit 102 acquires voice data from the voice input unit 16. The motion information acquisition unit 103 acquires acceleration data, angular velocity data, etc. from the motion sensor unit 17. The location information acquisition unit 104 acquires current location information from the GPS receiver unit 18. The peripheral terminal information acquisition unit 105 performs short-range wireless communication (e.g., Bluetooth Low Energy) via the communication unit 15 to acquire identifiers of other terminals in the vicinity.
[0035] The sensing data generation unit 106 generates sensing data for transmission to the server 20 by combining audio information acquired by the audio information acquisition unit 102, operation information acquired by the operation information acquisition unit 103 (including raw time-series data and basic features as corresponding to Appendix 4), and optionally current location information acquired by the location information acquisition unit 104 and identifiers of peripheral terminals acquired by the peripheral terminal information acquisition unit 105. In this case, the audio information may be processed as compressed audio clips (corresponding to Appendix 2). Note that the "audio information" transmitted from the monitored terminal 10 to the server 20 does not necessarily have to be a continuous audio clip. For example, the monitored terminal 10 may recognize basic acoustic events (e.g., detection of a human voice, detection of a specific warning sound, detection of a loud noise, etc.) and transmit them to the server 20 as structured acoustic event data including the type of event, time of occurrence, intensity, and duration. Even in this case, the server 20 combines this acoustic event data with separately received operational information and location information, inputting it into a machine learning model to determine the activity status of the monitored MP. Thus, acoustic event data is also a form of "voice information" that possesses acoustic characteristics that substantially contribute to the activity status determination by the server 20. The data transmission unit 107 transmits the sensing data generated by the sensing data generation unit 106 to the server 20 (corresponding to the "transmission means" of the monitored terminal in Appendix 1). The trigger event detection unit 108 detects abnormal conditions based on operational information, simple acoustic events, or entry into or exit from a pre-set specific area, and uses these as triggers for transmitting voice information (corresponding to Appendix 3). Details of the voice information transmission process based on this trigger event will be described later with reference to Figure 11.
[0036] Figure 7 shows an example of the main functional block configuration of the parent terminal 30. The parent terminal 30 includes a communication control unit 301, an information receiving unit 302, an information analysis and display control unit 303, a user interface unit 304, and a notification unit 305, among others.
[0037] The communication control unit 301 controls communication with the server 20 via the network NW. The information receiving unit 302 receives information transmitted from the server 20, including the activity status of the person being monitored (MP), the status of any accompanying persons (if any), and current location information (if any) (corresponding to the "receiving step" in Appendix 3 and the "receiving means" of the guardian terminal in Appendix 1).
[0038] The information analysis and display control unit 303 analyzes the received information and controls it to display it in a format that is easy for the parent / guardian PA to understand. For example, it can display activity status, companion status, and current location information combined on a map (corresponding to part of Appendix 10, see reference numeral 311 in Figure 10), or display it in chronological order (corresponding to part of Appendix 10). It also determines whether the received information matches specific warning conditions set in advance, and if it does, it controls the notification unit 305 to issue a warning (corresponding to Appendix 10). The user interface unit 304 accepts operation input from the parent / guardian PA via touch panel, buttons, etc., and displays various information on the display screen.
[0039] Specific variations of the display screen of the parent terminal 30 will be described later with reference to Figures 10 and 15 to 19. Based on instructions from the information analysis and display control unit 303, the notification unit 305 notifies the guardian PA of information regarding the status of the person being monitored in various forms, such as displaying it on a screen (see reference numeral 312 in Figure 10), providing audio notifications from a speaker, and providing vibration notifications (corresponding to the "notification steps" in Appendix 3 and the "notification means" of the guardian terminal in Appendix 1).
[0040] In the above embodiment, the activity status determination unit 204 selected or identified one activity category from a plurality of predefined activity categories and transmitted the result (e.g., category ID, text label) to the guardian terminal 30. However, the present invention is not limited to this. For example, instead of outputting an activity category directly, the activity status determination unit 204 may calculate a set of multiple situation indices that characterize the current situation of the person being monitored, or a set of confidence or probability scores for each activity category, and transmit these to the guardian terminal 30 as "information indicating the determined activity status". In this case, the information analysis and display control unit 303 of the guardian terminal 30 interprets the activity status of the person being monitored as a specific activity category based on the received situation indices and probability scores, according to pre-set threshold processing and logic, or mapping rules that are easy for the guardian PA to understand, and displays it via the notification unit 305. Even with this configuration, as long as the server 20 uses a machine learning model based on voice information and motion information to perform key information processing (feature extraction, pattern recognition, probability calculation, etc.) essential for identifying the activity category that the guardian ultimately recognizes, and the results of this processing form the basis for understanding and reporting the activity status on the guardian's terminal, it can be interpreted that the server 20 is substantially involved in determining the activity status of the person being monitored and presenting the results, and this can be considered to fall within the scope of the technical idea of the present invention. What is important is that the server 20 highly analyzes the sensing data and generates and provides high-quality information for the guardian to grasp the specific activities of the person being monitored.
[0041] (Explanation of data structure) Figure 9 is a data structure diagram showing an example of sensing data 900 transmitted from the monitored terminal 10 to the server 20 in this embodiment. The sensing data 900 may include, for example, header information 901, terminal ID 902, timestamp 903, voice information 904, operation information 905, location information 906 (optional), and a list of surrounding terminal identifiers 907 (optional). Header information 901 includes data type and data length. Audio information 904 may be compressed audio clip data or extracted acoustic features. Activity information 905 may be raw time-series data from the IMU or basic features such as step count and activity intensity calculated from it.
[0042] (Explanation of the processing flow) Figure 8 is a flowchart illustrating the overview of the activity status determination process centered on the server 20 in this embodiment. First, the data receiving unit 202 of the server 20 receives sensing data from the monitored terminal 10 (step S801). Next, the sensing data analysis unit 203 analyzes the received sensing data (especially voice information and motion information) and performs preprocessing such as feature extraction (step S802).
[0043] Next, the activity status determination unit 204 determines the current activity status of the monitored MP based on the machine learning model stored in the machine learning model storage unit 206 and the analyzed sensing data (and optionally contextual information such as location information) (step S803). Optionally, the companion status determination unit 205 determines the companion status based on the identifier and location information of the surrounding terminal included in the sensing data (step S804). A detailed example of this process is shown in Figure 12. Subsequently, the determination result generation unit 207 integrates the determined activity status, the status of any companions, location information, etc., and generates notification information for the guardian terminal 30 (step S805). At this time, warning generation based on complex conditions, as shown in Figure 13, may also be performed. Finally, the notification information transmission unit 208 transmits the generated notification information to the guardian terminal 30 (step S806). Based on the received information, the guardian terminal 30 is notified of the status of the person being monitored (MP).
[0044] Figure 11 is a sequence diagram of the voice information transmission process based on a trigger event in this embodiment (corresponding to appendices 2 and 3). The trigger event detection unit 108 of the monitored terminal 10 detects an abnormal state based on information from the motion sensor unit 17 (step S1101). When it is determined that a trigger event has occurred (step S1102: YES), the voice information acquisition unit 102 controls the voice input unit 16 to acquire voice data for a predetermined time (for example, a few seconds before and after the event occurs) (step S1103), and the sensing data generation unit 106 processes this as a compressed audio clip (step S1104). Then, the data transmission unit 107 transmits the sensing data including this compressed audio clip to the server 20 (step S1105).
[0045] Figure 12 is a flowchart of the companion status determination process in the server 20 in this embodiment (corresponding to appendices 7, 8, and 9). When the data receiving unit 202 of the server 20 receives sensing data including a list of surrounding terminal identifiers and current location information from the monitored terminal 10 (step S1201), the companion status determination unit 205 first compares each identifier with the information registered in the user information management unit 209 to determine whether it is a registered terminal or an unknown terminal (step S1202). Next, it grasps the number and relative positions (distance, etc.) of each terminal (step S1203). Furthermore, optionally, it refers to the past encounter pattern database (stored in the data storage and learning unit 210) and analyzes the relationship between the frequency, location, and time of encounter with a specific unknown terminal (step S1204). Based on these analysis results, it determines the overall companion status (e.g., "at the park with friend A", "contact with two unknown terminals near the station", etc.) (step S1205).
[0046] Figure 13 is a conceptual diagram of the complex condition warning generation process in the server 20 in this embodiment (corresponding to Appendix 10). The judgment result generation unit 207 receives activity status information 1301 from the activity status determination unit 204, companion status information 1302 (if any) from the companion status determination unit 205, and current location information 1303 (if any) from the monitored terminal 10 as input. This information is compared with a set of warning condition rules 1304 (e.g., "Alone with an unknown terminal AND within a specific danger area AND activity status is "still" or "conversation" continues for a long period of time"). If any of the warning conditions are met (1305), warning notification information 1306 including the corresponding warning level and warning message is generated and sent to the guardian terminal 30 via the notification information transmission unit 208.
[0047] Figure 14 is a conceptual diagram of the learning and updating process of the machine learning model in this embodiment (corresponding to Appendix 11). The data storage and learning unit 210 stores sensing data collected from multiple monitored terminals 10 and the correct labels for the activity status of that data (for example, assigned through feedback or annotation from guardians) (1401). Periodically, or when a predetermined amount of data has been accumulated, the machine learning model 1402 in the machine learning model storage unit 206 is retrained and fine-tuned using this data, and the model is updated (1403). The updated model is then used for determining the activity status.
[0048] (Variations in the display screen of the parent device) In addition to the basic display example shown in Figure 10, the display screen of the parent terminal 30 can be modified to further improve the quality of situation awareness by the parent PA by adopting the following variations. These display controls are mainly performed by the information analysis and display control unit 303 of the parent terminal 30.
[0049] Figure 15 shows another example of the display screen of the guardian terminal 30, illustrating an example of visualizing and displaying the risk level regarding the current situation of the person being monitored (MP). At the top of the screen, the current risk level, calculated by comprehensively analyzing the activity status, companion status, location information, time, etc., received from the server 20, is displayed, for example, as a color-coded indicator 1501 (e.g., green for safe, yellow for caution, red for warning), a gauge display, or an intuitive icon (e.g., smile, caution mark, warning mark). This risk level is calculated to be high, for example, if the activity status is "driving," and the person is alone with an unknown terminal, and it is at night in a deserted location. The guardian PA can grasp the urgency of the situation at a glance by looking at this risk level display.
[0050] Figure 16 shows another example of the display screen of the guardian terminal 30, specifically a privacy-conscious display settings screen. Through this settings screen 1600, the guardian PA can set the level of detail of the information collected and displayed from the monitored MP in stages. For example, in "normal mode" 1601, only the activity category and whether or not the user is inside or outside the safe area are displayed; in "caution mode" 1602 (for example, when the risk level is yellow or higher, or when manually switched by the guardian), the current location and the type of companion (friend / unknown) are displayed; and in "emergency mode" 1603 (for example, when the risk level is red or when an SOS is received), even more detailed information (such as a playback button for audio clips and movement trajectory) is displayed. In addition, depending on the age and consent level of the monitored MP, there are also items 1604 that set the information items to be displayed by default and the range of information that the guardian PA can access. This allows for balancing the guardian's peace of mind with the monitored person's privacy.
[0051] Figure 17 shows another example of the display screen of the parent terminal 30, illustrating an example of displaying predictive and warning information. When the server 20 comprehensively analyzes past behavioral patterns, current movement speed and direction, calendar information (e.g., school schedule), weather information, etc., and predicts potential risks or warning signs of behavior to be aware of in the future, that information is notified to and displayed on the parent terminal 30. For example, the screen displays a marker 1701 indicating the current location and direction of movement of the person being monitored (MP), along with a predicted area 1702 that is expected to be reached in a few minutes based on that movement trend, and known hazardous areas 1703 (if any) along that path, on a map. In addition, a text message 1704 such as "If you continue on this path, you may enter the designated hazardous area X in about 5 minutes" is displayed. This allows the parent PA to obtain information to take preventive measures before a problem occurs.
[0052] Figure 18 shows another example of the display screen of the guardian terminal 30, illustrating the communication integration UI elements. Depending on the status of the monitored person MP (activity status, location, etc.), UI elements are displayed to allow the guardian PA to easily contact the monitored terminal 10. For example, a group of buttons 1802 that allows sending pre-registered standard messages (e.g., "Are you okay?", "Where are you?", "It's almost time to go home") with a single tap, a button 1803 to start a voice call, and a button 1804 to start a video call (if the monitored terminal supports it) are displayed near the monitored person MP's status display area 1801, depending on the situation. This allows the guardian PA to quickly initiate communication to check the situation and give instructions.
[0053] Figure 19 shows another example of the display screen of the parent terminal 30, illustrating UI elements for collecting feedback on the results of the activity status and companion status determination notified by the server 20. For example, the parent PA can easily provide feedback on whether the displayed activity status (e.g., "Playing in the park" 1901) was correct ("Yes" button 1902) or incorrect ("No" button 1903). If "No" is selected, an interface may be displayed allowing the parent to select the correct activity status from a list of options (1904) or to enter it in free text. The collected feedback information is sent to the server 20 and used as training data for improving the accuracy and personalization of the machine learning model 1402 (see Figure 14). This continuously improves the system's determination accuracy.
[0054] (Definition of terms) In this specification, "sensing data" refers to all sensor information acquired by the monitored terminal, including at least voice information and motion information, and optionally including location information and identifiers of surrounding terminals. "Activity status" refers to actions and states performed by the monitored person classified into predefined categories (e.g., "playing," "learning," "conversation," etc.). "Companion status" refers to information indicating the relationship and number of owners of other terminals present around the monitored person.
[0055] (Variations of the invention, alternative configurations, modifications, application examples, etc.) In the above embodiment, a configuration was described in which the server 20 determines the activity status and the status of companions. However, the present invention may also include a distributed processing configuration in which some simple determinations (for example, identification of basic actions or simple proximity detection of registered terminals) are performed on the monitored terminal 10 side, the results are sent to the server 20, and the server 20 performs more advanced analysis and complex decisions.
[0056] Furthermore, the machine learning model can be configured not only to be centrally trained and updated on the server 20, but also to collect information from multiple monitored devices 10 and parent devices 30 in a privacy-conscious manner, and to be trained and updated in a distributed manner using methods such as federated learning.
[0057] Notifications to the parent device 30 may be configured to calculate a risk score based on activity status, companion status, location information, etc., and to change the urgency and content of the notification according to that score.
[0058] This system can be applied to a variety of fields, not only for monitoring children and the elderly, but also for monitoring pets, ensuring employee safety, or monitoring the condition of workers in specific environments.
[0059] In this specification, "server 20" is not limited to a single physical computer device, but may also include a group of multiple physical or virtual computer devices that cooperate with each other via a network (NW) and collectively provide the server functions described herein, multiple functional modules in a cloud computing environment, or a distributed system based on a microservices architecture. For example, all or part of each functional block of server 20 shown in Figure 5 (data receiving unit 202, activity status determination unit 204, companion status determination unit 205, determination result generation unit 207, notification information transmission unit 208, etc.) may be implemented as independent microservices, and the functions of server 20 as a whole may be realized by communicating and cooperating with each other via an API (Application Programming Interface), etc. In this case, each microservice may operate on the same or different computers.
[0060] This invention improves the computer functionality of server 20. Specifically, it provides a new information processing algorithm (including the application of machine learning models) that comprehensively analyzes various types of sensing data (voice, motion, location, surrounding terminal ID, etc.) transmitted from the monitored terminal 10, and accurately determines the complex activity state and social situation of subjects that are particularly difficult to predict, such as children. This makes it possible to generate high-quality monitoring information that could not be obtained before and provide it to the guardian terminal 30. This information processing utilizes the processing power of the computer specifically to solve a particular technical problem (real-time estimation of the detailed situation of the monitored person), and represents a technical contribution that goes beyond mere data collection or general information provision.
[0061] Furthermore, the parent terminal 30 presents detailed activity status, companion status, location information, etc., provided by the server 20 to the parent PA via an intuitive and easy-to-understand user interface (UI) as shown in Figures 10, 15 to 19. For example, activity history can be displayed as icons on a map (symbol 311 in Figure 10), risk levels can be displayed using different colors (symbol 1501 in Figure 15), the level of detail of the displayed information can be customized (Figure 16), future risk indicators can be displayed (Figure 17), communication methods can be provided according to the situation (Figure 18), and feedback on the judgment results can be easily provided (Figure 19). As a result, parents can quickly and accurately grasp complex information and take appropriate action. This provides a concrete improvement to the information presentation method and operability in the UI of the parent terminal 30.
[0062] It should be noted that the present invention is not limited to the embodiments described above, and can be modified as appropriate without departing from the spirit of the invention. Furthermore, the above-described embodiments and modifications can be implemented by appropriately combining their respective features.
[0063] [General tasks] To provide parents and guardians with higher-quality monitoring information.
[0064] [Problems corresponding to Appendix 1] Same as the general-purpose problems above. [Note 1] A monitoring system comprising: a monitored terminal equipped with a transmitting unit that acquires sensing data including audio information of the surroundings of the monitored person and motion information of the monitored person and transmits it to a server; the server equipped with a receiving unit that receives the sensing data transmitted from the monitored terminal, a determination unit that determines the current activity state of the monitored person from among a plurality of predefined activity categories using a machine learning model based on the audio information and motion information from the received sensing data, and a transmitting unit that transmits information indicating the determined activity state to a guardian terminal; and a guardian terminal equipped with a notification unit that receives the information transmitted from the server and notifies the guardian of information regarding the activity state of the monitored person. (Effects of Appendix 1) Based on the voice and motion information transmitted from the monitored device, the server uses a machine learning model to determine the activity status of the person being monitored. This allows for more advanced and detailed identification of activity status without being limited by the device's resources, enabling guardians to grasp specific information about the person being monitored with high accuracy and improving their sense of security.
[0065] In particular, the present invention provides a mechanism for parents to determine, for example, whether a child being monitored is in a "play state" when this state is detected, by combining this information with separately provided location information and information about accompanying persons (a subordinate configuration). Such qualitative judgments are difficult to make based solely on information such as "high activity level." Similarly, while parents can feel at ease if a "learning state" is detected at an appropriate time and place, if information is obtained that a "conversation state" has continued for a long time late at night with an unknown device (which can also be detected by a subordinate configuration), parents can consider specific actions at an earlier stage. Thus, the present invention provides a remarkable effect not found in conventional technology by providing specific activity details of the person being monitored with high resolution, thereby reducing the subjective anxiety of parents and enabling appropriate monitoring and intervention based on objective information.
[0066] (Specific examples of remarkable effects) For example, a situation that conventional activity trackers could only classify as "high activity" can be identified as "active play with friends in a park" according to the present invention, giving parents peace of mind and allowing them to positively view the opportunity for their child's social development. Conversely, even if the same "high activity" is detected, if it is identified as "running alone in a dangerous place late at night," it can be recognized as a situation requiring immediate intervention. Furthermore, the "prolonged conversation with an unknown adult," which was often overlooked in conventional systems, can become a clear warning target in this invention (when combined with accompanying person information, which is a subordinate component). This makes apparent potential risks that could not be evaluated with information such as "still" or "conversing," and thus makes a significant contribution to ensuring the safety of children. Furthermore, by recording and analyzing information such as the duration and frequency of "learning states," the diversity of "play states," and the partners in "conversation states" (whether they are registered friends or not) over the long term, parents can objectively understand their child's lifestyle, concentration levels, changes in interests, and evolving friendships. This can lead to more personalized parenting support, appropriate communication tailored to the child's developmental stage, or even prompts for consultation with professional organizations. This represents high-quality, value-added information that could never be obtained with conventional location data or simple activity level data alone.
[0067] [Issues related to Appendix 2] Reducing the amount of communication data and considering privacy when transmitting voice information. [Note 2] A monitoring system as described in Note 1, wherein the transmitting unit of the monitored terminal transmits the voice information to the server as a compressed audio clip based on the occurrence of a predetermined trigger event. (Effects of Appendix 2) By transmitting voice information only when necessary, the amount of communication data can be reduced, and the impact on privacy can be minimized.
[0068] [Challenges related to Appendix 3] To specify the trigger for voice information transmission and achieve more efficient operation. [Note 3] A monitoring system as described in Note 2, wherein the trigger event includes at least one of the following: detection of an abnormal state based on the operation information of the monitored terminal, detection of a simple acoustic event in the monitored terminal, and entry or exit of the monitored person into or from a predetermined specific area. (Effects of Appendix 3) It is possible to reliably acquire voice information when an anomaly occurs or under specific circumstances, while suppressing unnecessary transmissions during normal operation.
[0069] [Issues related to Appendix 4] Balancing the flexibility of server-side analysis and communication efficiency in transmitting operational information. [Appendix 4] A monitoring system according to any one of the appendices 1 to 3, wherein the transmitting unit of the monitored terminal transmits the operation information to the server as raw time-series data acquired from the inertial measurement unit and / or basic features including the number of steps or activity intensity calculated from the raw time-series data. (Effects of Appendix 4) This enables detailed operational analysis on the server side, while also allowing for efficient information transmission using basic features as needed.
[0070] [Issues corresponding to Appendix 5] To gain a more concrete understanding of the activity status of the person being monitored. [Appendix 5] A monitoring system according to any one of the appendices 1 to 4, wherein the activity state determined by the determination unit of the server includes at least one selected from the group including "playing state", "learning state", and "conversation state". (Effects of Appendix 5) By identifying the nuanced activity states unique to children, it is possible to deepen parents' understanding of the situation and achieve higher quality supervision.
[0071] [Issues corresponding to Appendix 6] To improve the accuracy of activity status determination. [Appendix 6] A monitoring system according to any one of the appendices 1 to 5, wherein the monitored terminal further comprises a transmitting unit that acquires the current location information of the monitored person and transmits it to the server, and the determination unit of the server further refers to the current location information when determining the activity status. (Effects of Appendix 6) By using location information as a context for determining activity status, the accuracy of the determination can be improved.
[0072] [Issues corresponding to Appendix 7] Understanding the social circumstances of the person being monitored (who they are with). [Appendix 7] A monitoring system according to any one of the appendices 1 to 6, wherein the monitored terminal further comprises a transmitting unit that acquires an identifier of a nearby peripheral terminal and transmits it to the server, the determination unit of the server further determines the status of the person being monitored based on the identifier and (if any) the current location information of the person being monitored, and the transmitting unit of the server transmits to the guardian terminal information indicating the determined status of the person being monitored in addition to the information indicating the determined activity status. (Effects of Appendix 7) By understanding the situation of accompanying persons in addition to the activity status, the situation of the person being monitored can be understood from a more multifaceted and detailed perspective, thereby increasing the sense of security of the guardian.
[0073] [Issues related to Appendix 8] To gain a more concrete understanding of the circumstances of the accompanying person. [Note 8] A monitoring system as described in Note 7, wherein the status of the accompanying person determined by the determination unit of the server includes at least one of the following: whether or not the peripheral terminal is a terminal that has been registered in advance, the number of registered terminals, and the number of unknown terminals. (Effects of Appendix 8) By knowing whether the companions are known or unknown, and how many there are, the safety of the situation can be assessed in more detail.
[0074] [Issues related to Appendix 9] To conduct a deeper analysis of potential risks and relationships with accompanying persons. [Note 9] A monitoring system as described in Note 7 or 8, wherein the determination unit of the server further analyzes past encounter patterns between the person being monitored and a specific peripheral terminal when determining the status of the accompanying person. (Effects of Appendix 9) It is possible to identify concerning behavioral patterns, such as continuous contact with specific unknown devices, and to warn of potential risks early.
[0075] [Issues related to Appendix 10] To make notifications to parents more accurate and useful. [Appendix 10] A monitoring system according to any one of the appendices 1 to 9, wherein the notification unit of the guardian terminal notifies a warning when the combination of the information indicating the activity status transmitted from the server, the information indicating the status of the accompanying person, and the current location information matches a specific warning condition set in advance. (Effects of Appendix 10) By issuing warnings based on a combination of multiple pieces of information and complex conditions, false alarms can be reduced, and parents can be accurately notified of situations that truly require attention.
[0076] [Challenges corresponding to Appendix 11] Continuously improve the accuracy of the machine learning model's judgments. [Appendix 11] A monitoring system according to any one of the appendices 1 to 10, wherein the server further comprises an update unit that updates the machine learning model using information collected from a plurality of monitored terminals or past data of the monitored person. (Effects of Appendix 11) By optimizing the machine learning model in response to new data and changes in usage, the system's judgment accuracy can be continuously improved.
[0077] [Issues related to Appendix 12] Strengthen the protection of privacy of personal information (especially voice information and motion information) transmitted to the server. [Appendix 12] A monitoring system according to any one of the appendices 1 to 11, wherein the server further comprises a processing unit that performs anonymization processing or deletion processing of the received voice information and operation information within a predetermined period after determining the activity status. (Effects of Appendix 12) Reduces privacy concerns regarding the handling of sensitive information on the server side, and enhances the reliability and social acceptance of the system.
[0078] [Problems corresponding to Appendix 13] Same as the general-purpose problems above. (Invention of a standalone server) [Note 13] A server comprising: a receiving unit that receives sensing data including audio information about the surroundings of the person being monitored and motion information of the person being monitored; a determination unit that uses a machine learning model to determine the current activity state of the person being monitored from among a plurality of predefined activity categories based on the audio information and motion information from the received sensing data; and a transmission unit that transmits information indicating the determined activity state to a guardian terminal. (Effects of Appendix 13) By using a machine learning model to determine the activity status of the person being monitored based on the voice and motion information transmitted from the monitored device, it becomes possible to identify the activity status in a more advanced and detailed manner.
[0079] [Problems corresponding to Appendix 14] Same as the general problems above. (Invention of a monitoring terminal) [Note 14] A monitoring terminal comprising: an audio acquisition unit that acquires audio information about the surroundings of the person being monitored; an action acquisition unit that acquires action information about the person being monitored; and a transmission unit that transmits sensing data including the acquired audio information and action information to a server that determines the activity status of the person being monitored. (Effects of Appendix 14) By transmitting the voice and movement information of the person being monitored to the server, the server can perform advanced activity status determination.
[0080] [Problems corresponding to Appendix 15] Same as the general problem above. (Invention of a parental device) [Note 15] A guardian terminal comprising: a receiving unit that receives information transmitted from a server, including the result of determining the activity status of the person being monitored; and a notification unit that notifies information regarding the activity status of the person being monitored based on the received information. (Effects of Appendix 15) Parents can receive and understand information about the activity status of the person being monitored, as determined by the server.
[0081] [Problems corresponding to Appendix 16] Same as the general problems above. (Invention of information processing methods) [Note 16] An information processing method executed by the server's processor, comprising: receiving sensing data transmitted from a monitored terminal, which includes audio information about the surroundings of the monitored person and information about the monitored person's movements; determining the current activity state of the monitored person from among a plurality of predefined activity categories using a machine learning model based on the audio information and the movement information from the received sensing data; and transmitting information indicating the determined activity state to a guardian terminal. (Effects of Appendix 16) By using a machine learning model to determine the activity status of the person being monitored based on the voice and motion information transmitted from the monitored device, it becomes possible to identify the activity status in a more advanced and detailed manner.
[0082] [Problems corresponding to Appendix 17] Same as the general problems above. (Program invention) [Note 17] A program that causes the processor of a guardian terminal to perform the steps of receiving information transmitted from a server, including the result of determining the activity status of the person being monitored, and notifying the guardian of information regarding the activity status of the person being monitored based on the received information. (Effects of Appendix 17) The guardian's device can receive and understand information about the activity status of the person being monitored, as determined by the server. [Explanation of Symbols]
[0083] 1…Monitoring system 10…Monitored device 11…CPU 12…ROM 13…RAM 14...Storage section 15… Communications Department 16…Voice input section 17…Motion sensor unit 18…GPS receiver 19... Bus 101...Sensor Control Unit 102...Voice Information Acquisition Unit 103…Operation information acquisition unit 104...Position information acquisition unit 105... Peripheral terminal information acquisition unit 106...Sensing data generation unit 107...Data transmission unit 108... Trigger event detection unit 20... Server 21…CPU 22…ROM 23…RAM 24...Storage section 25... Communications Department 26...Input section 27…Output section 28... Bus 201...Communication Control Unit 202...Data receiving unit 203...Sensing Data Analysis Department 204...Activity Status Determination Unit 205... Companion Status Determination Unit 206...Machine learning model memory unit 207…Judgment result generation unit 208...Notification Information Transmission Unit 209...User Information Management Department 210...Data Storage and Learning Department 211…Privacy Processing Section 212...Log Recording Section 30…Parental device 31…CPU 32...ROM 33…RAM 34...Storage section 35... Communications Department 36...Input section 37…Output section 38... Bus 301...Communication Control Unit 302... Information Receiving Unit 303... Information Analysis and Display Control Unit 304...User Interface Department 305... Hochi Department 311…Activity history display area 312…Notification message display area 900... Sensing data 901…Header Information 902… Terminal ID 903… Timestamp 904...Audio information 905...Operation Information 906…location information 907…List of peripheral device identifiers MP... Person under guardianship PA…Parent NW...Network S801~S806...Step S1101~S1105...Step S1201~S1205...Step 1301... Activity Status Information 1302... Information on the status of accompanying persons 1303…Current location information 1304…Warning Condition Rules 1305...Condition matching determination 1306…Warning notification information 1401...Accumulated data 1402…Machine learning models 1403...Model training and update process 1501…Risk Level Indicator 1600... Privacy settings screen 1601...Normal mode setting 1602... Caution Mode Setting 1603...Emergency mode setting 1604…Information item settings 1701...Current location marker 1702... Predicted arrival area 1703... Hazardous area marking 1704…Predicted message 1801...Status display area 1802...Group of buttons for sending pre-set messages 1803... Voice call start button 1804... Start video call button 1901...Activity status display 1902...Feedback "Yes" button 1903... Feedback "No" button 1904…Correct activity status selection
Claims
1. A monitoring terminal equipped with a transmitting unit that acquires sensing data including audio information about the surroundings of the person being monitored and information about the person being monitored's movements, and transmits it to a server, The server comprises: a receiving unit that receives the sensing data transmitted from the monitored terminal; a determination unit that determines the current activity status of the monitored person from among a plurality of predefined activity categories using a machine learning model based on the voice information and the motion information from the received sensing data; and a transmission unit that transmits information indicating the determined activity status to the guardian terminal. The guardian terminal includes a notification unit that receives the information transmitted from the server and notifies the guardian of information regarding the activity status of the person being monitored, A monitoring system equipped with these features.
2. A monitoring system according to claim 1, The monitoring system includes a transmitting unit in the monitored terminal that transmits the voice information to the server as a compressed audio clip based on the occurrence of a predetermined trigger event.
3. A monitoring system according to claim 2, A monitoring system in which the trigger event includes at least one of the following: detection of an abnormal state based on the operational information of the monitored terminal, detection of a simple acoustic event in the monitored terminal, and entry or exit of the monitored person into or from a pre-set specific area.
4. A monitoring system according to claim 1, A monitoring system in which the transmitting unit of the monitored terminal transmits the operation information to the server as raw time-series data acquired from an inertial measurement unit and / or basic features including the number of steps or activity intensity calculated from said raw time-series data.
5. A monitoring system according to claim 1, The monitoring system wherein the activity state determined by the determination unit of the server includes at least one selected from the group including "playing state", "learning state", and "conversation state".
6. A monitoring system according to claim 1, The monitored terminal further includes a transmission unit that acquires the current location information of the monitored person and transmits it to the server. The determination unit of the server is a monitoring system that further refers to the current location information when determining the activity status.
7. A monitoring system according to claim 1, The monitored terminal further includes a transmission unit that acquires identifiers of nearby peripheral terminals and transmits them to the server. The determination unit of the server further determines the status of the person being monitored's companions based on the identifier and the current location information of the person being monitored. The transmission unit of the server is a monitoring system that transmits to the guardian terminal information indicating the status of the companion determined in addition to the information indicating the determined activity status.
8. A monitoring system according to claim 7, The companion status determined by the determination unit of the server is a monitoring system that includes at least one of the following: whether or not the peripheral terminal is a terminal that has been registered in advance, the number of registered terminals, and the number of unknown terminals.
9. A monitoring system according to claim 7 or 8, The determination unit of the server is a monitoring system that further analyzes past encounter patterns between the person being monitored and a specific peripheral terminal when determining the status of the accompanying person.
10. A monitoring system according to claim 1, The notification unit of the guardian terminal is a monitoring system that notifies a warning when the combination of the information indicating the activity status, the information indicating the status of the accompanying person, and the current location information transmitted from the server matches specific warning conditions set in advance.
11. A monitoring system according to claim 1, The monitoring system further comprises an update unit that updates the machine learning model using information collected from multiple monitored terminals or past data of the monitored person.
12. A monitoring system according to claim 1, The server is a monitoring system further comprising a processing unit that, after determining the activity status, performs anonymization or deletion processing on the received voice information and operation information within a predetermined period.
13. A receiving unit that receives sensing data including audio information about the surroundings of the person being monitored and information about the person being monitored's movements, A determination unit that uses a machine learning model to determine the current activity status of the person being monitored from among a plurality of predefined activity categories based on the voice information and motion information from the received sensing data, A transmitting unit that transmits information indicating the determined activity status to the parent's terminal, A server equipped with the following features.
14. A voice acquisition unit that acquires audio information from the surroundings of the person being monitored, The motion acquisition unit acquires motion information of the person being monitored, A transmission unit that transmits sensing data including the acquired voice information and the motion information to a server that determines the activity status of the person being monitored, A monitoring terminal equipped with the following features.
15. A receiving unit that receives information, including the results of determining the activity status of the person being monitored, transmitted from the server, Based on the information received, a notification unit provides information regarding the activity status of the person being monitored, A parental device equipped with these features.
16. The way in which the server's processor executes, The steps include receiving sensing data transmitted from the monitored terminal, which includes audio information about the surroundings of the monitored person and information about the monitored person's movements, The steps include: determining the current activity status of the person being monitored from among a predetermined set of activity categories using a machine learning model based on the voice information and motion information from the received sensing data; The steps include: transmitting information indicating the determined activity status to the parent's device; Information processing methods including
17. The processor in the parent device, The steps include receiving information from the server, including the result of determining the activity status of the person being monitored, and Based on the information received, the step of notifying information regarding the activity status of the person being monitored, A program to execute.