Child monitoring information provision system, monitored terminal, server, program, and information provision method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The child monitoring system addresses the issue of real-time safety updates by converting child voice to text and transmitting ambient data, improving caregiver understanding and reducing anxiety through AI-assisted situation awareness.

JP2026109497APending Publication Date: 2026-07-01MIXI INC

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: MIXI INC
Filing Date: 2025-05-27
Publication Date: 2026-07-01

Application Information

Patent Timeline

27 May 2025

Application

01 Jul 2026

Publication

JP2026109497A

IPC: G08B25/04; G08B21/02

AI Tagging

Technology Topics

Information processingTerminal server

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 2026109497000001_ABST

Patent Text Reader

Abstract

To help parents understand their child's situation, we provide a child monitoring information system. [Solution] The child monitoring information provision system comprises a monitored terminal 10, a guardian terminal 50, and a server 90. The monitored terminal 10 includes an audio output unit that outputs messages from the guardian terminal 50 as audio, an audio acquisition unit that acquires the child's voice, converts it into text information, and transmits it, and a timeout monitoring unit that transmits ambient audio data, location information, and optionally volume classification information, sound type classification results, situation estimation results, and images if text information is not acquired within a predetermined time. The guardian terminal 50 sends messages to the monitored terminal 10 and receives and outputs various information. The server 90 relays communication between these terminals and supports information processing.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a childcare monitoring information providing system, a monitored terminal, a server, a program, and an information providing method.

Background Art

[0002] In recent years, in order to ensure the safety of children, a service has been widely used in which a child is provided with a mobile terminal (hereinafter referred to as a monitoring terminal) equipped with a GPS function or the like, and a caregiver can check the location information thereof. These terminals have also come to be equipped with functions for assisting communication between parents and children.

[0003] For example, there have appeared monitoring terminals equipped with a so-called voice chat function or a voice interface utilizing AI technology, which synthesize a caregiver's text message into voice on the monitoring terminal side and convey it to the child, and convert the voice message from the child into text and deliver it to the caregiver. This enables smooth communication even if the child is not used to text input or the caregiver is in an environment where it is difficult to hear voices. There are also monitoring terminals equipped with a call function and a text message transmission / reception function.

[0004] However, in these conventional technologies, when a caregiver sends a message but the child does not notice it or cannot respond immediately, it has been difficult for the caregiver to grasp the safety and situation of the child in real time. In particular, there has been a problem that the caregiver's anxiety increases when there is no response from the child for a certain period of time.

[0005] Also, as a technology for changing the operation of the terminal according to the state of the child, for example, in an infant monitor, when an abnormality such as apnea is detected, a control technology using a finite state machine (FSM) that gradually increases the warning level according to the presence or absence of a reaction is also known (see, for example, Patent Document 1 described later).

Prior Art Documents

Patent Documents

[0006] [Patent Document 1] U.S. Patent No. 8502679 [Overview of the project] [Problems that the invention aims to solve]

[0007] This invention was made in view of the above circumstances, and aims to make it easier for parents to understand their child's situation when supervising them. [Means for solving the problem]

[0008] To solve the above problems, a child monitoring information provision system according to one aspect of the present invention comprises a monitored terminal, a guardian terminal, and a wireless communication unit for communication between the two terminals, wherein the monitored terminal includes an audio output unit that outputs messages received from the guardian terminal as audio, an audio acquisition unit that acquires the child's voice, converts it into text information, and transmits it to the guardian terminal, and a timeout monitoring unit that transmits ambient audio data and location information to the guardian terminal if the text information is not acquired within a predetermined time. [Effects of the Invention]

[0009] According to one aspect of the present invention, when monitoring children, parents can more easily understand the situation of their children. [Brief explanation of the drawing]

[0010] [Figure 1] This figure shows the overall configuration of a child monitoring information provision system (including a server) according to one embodiment of the present invention. [Figure 2] This is a block diagram showing the hardware configuration of a monitored terminal according to one embodiment of the present invention. [Figure 3] This is a functional block diagram of a monitored terminal according to one embodiment of the present invention. [Figure 4] This figure shows the processing sequence related to one embodiment of the present invention. [Figure 5]This diagram shows the state transitions of a monitored terminal according to one embodiment of the present invention. [Figure 6] This is a flowchart of the volume classification process according to one embodiment of the present invention. [Figure 7] This is a flowchart of the sound type classification process related to one embodiment of the present invention. [Figure 8] This figure shows the sequence of image attachment processing according to one embodiment of the present invention. [Figure 9] This is a functional block diagram of a parental device relating to one embodiment of the present invention. [Figure 10] This figure shows an example of a transmitted data structure according to one embodiment of the present invention. [Figure 11] This figure shows an example of the screen display of a parental device according to one embodiment of the present invention. [Figure 12] This figure shows the processing sequence when a specific voice is detected according to one embodiment of the present invention. [Figure 13] This is a block diagram illustrating the concept of policy control related to one embodiment of the present invention. [Figure 14] This is a block diagram showing the hardware configuration of a server relating to one embodiment of the present invention. [Modes for carrying out the invention]

[0011] Embodiments of the present invention will be described in detail below with reference to the drawings. In each drawing, the same or corresponding elements are denoted by the same reference numerals, and redundant explanations are omitted as appropriate.

[0012] (Overview of the entire system) Figure 1 shows the overall configuration of a child monitoring information provision system 1 according to one embodiment of the present invention. This system 1 mainly comprises a monitoring terminal 10 carried by the child, a guardian terminal 50 held by the guardian, a server 90 that relays communication between these terminals and supports various information processing, and a network 80 to which these are connected.

[0013] Here, the "monitored terminal 10" is not necessarily limited to a dedicated hardware device. It can refer to a functional entity that realizes each of the following functional units when a program according to the present invention (corresponding to Supplementary Note 13) is executed on a general-purpose computer device such as a smartphone, a tablet terminal, or a smartwatch. Similarly, the "guardian terminal 50" can also be realized when a corresponding program (corresponding to Supplementary Note 15) is executed on a general-purpose computer device. Therefore, the present invention includes not only a system using dedicated terminals but also a system realized by application software installed on a smartphone or the like and a server that cooperates as needed (corresponding to Supplementary Note 26).

[0014] In addition to the function of periodically transmitting the position information of the child or in response to a request from the guardian terminal 50, the monitored terminal 10 has a function of transmitting and receiving messages with the guardian terminal 50. In particular, in the present embodiment, in response to a message from the guardian terminal 50, the monitored terminal 10 accepts the child's voice response, converts it into character information, and transmits it. And when there is no response for a certain period of time, it automatically transmits the surrounding voice (ambient voice data) and position information to the guardian terminal 50 (fallback function). Furthermore, the monitored terminal 10 may also have a function of analyzing the transmitted ambient voice data, adding volume classification information or sound type classification results, or estimating the situation of the child using AI.

[0015] The guardian terminal 50 is a smartphone, a tablet terminal, a PC, etc. By executing dedicated application software, it has functions such as transmitting a message to the monitored terminal 10, displaying or outputting in voice the character information, ambient voice data, position information, volume classification information, sound type classification results, estimated child situation, image, etc. received from the monitored terminal 10.

[0016] Figure 9 shows an example of the functional blocks of the parent terminal 50. The parent terminal 50 includes a message transmission unit 51, a data reception unit 52, an information display unit 53, an audio output control unit 54, a communication control unit 55, a control unit 56 that comprehensively controls these units, and a storage unit 57 that stores various information. The message transmission unit 51 sends a message to the monitored terminal 10 based on the parent's operation. The data reception unit 52 receives various information from the monitored terminal 10. The information display unit 53 displays the received information on the screen (see Figure 11). The audio output control unit 54 outputs the received audio data and text information as audio. The communication control unit 55 controls communication via the network 80.

[0017] Network 80 can include various wireless communication technologies and wired communication networks, such as low-power wide-area wireless communication technologies (LPWA) like LTE Cat-M1, NB-IoT, and BLE (Bluetooth Low Energy), as well as Wi-Fi and cellular networks. The monitored terminal 10 and the guardian terminal 50 communicate directly via this network 80, or indirectly via a server 90 which can be constructed as a cloud server providing the functions according to the present invention (corresponding to Appendix 27). Therefore, the "wireless communication unit for communication between the two terminals" is a concept that includes not only an interface for direct communication, but also a communication interface that enables information exchange between the two terminals in substance through communication with the server 90, and standard wireless communication functions (Wi-Fi, LTE, etc.) provided by each terminal may fall under this category. The server 90 may provide functions such as message relaying, data storage, AI analysis processing, and policy management.

[0018] (Description of the hardware configuration of the monitored device) Figure 2 is a block diagram showing the hardware configuration of the monitored terminal 10. The monitored terminal 10 includes a control unit 11, a storage unit 12, a communication unit 13, an audio input / output unit 14, a GPS receiver 15, and (optionally) an imaging unit 16, an acceleration sensor (not shown), and the like.

[0019] The control unit 11 is a CPU (Central Processing Unit) or MPU (Micro Processing Unit), and controls the operation of the entire monitored terminal 10 by executing programs stored in the memory unit 12. Each function of the functional block (Figure 3), which will be described later, is mainly realized by this control unit 11 executing programs.

[0020] (Explanation of the functional block configuration of the monitored device) Figure 3 is a functional block diagram of the monitored terminal 10. The monitored terminal 10 functions as follows, with the control unit 11 executing a program in the storage unit 12: message receiving unit 20, voice output unit 21, voice acquisition unit 22, timeout monitoring unit 23, location information acquisition unit 24, data transmission unit 25, voice processing unit 26, state management unit 27, encryption unit 28, power saving control unit 29, volume classification unit 30, sound type classification unit 31, image acquisition unit 32, specific voice detection unit 60 (see Figure 12), situation estimation unit 33, and policy control unit 34 (see Figure 13). Some or all of these functional blocks may be implemented on the server 90 side, and the monitored terminal 10 may be configured to operate in cooperation with the server 90. For example, the parts of the timeout monitoring unit 23 and the situation estimation unit 33 that have a high processing load can be executed on the server 90 side, and the monitored terminal 10 can be specialized in transmitting sensor information and receiving instructions from the server 90. Even in this case, if the system as a whole realizes the function described in the claim, it falls within the scope of the present invention.

[0021] The message receiving unit 20 receives messages sent from the parent terminal 50 via the communication unit 13.

[0022] The audio output unit 21 outputs the message received by the message receiving unit 20 as audio from the speaker 14b of the audio input / output unit 14. If the message is in text format, it is output after undergoing text-to-speech (TTS) processing. This TTS processing may be handled by the audio processing unit 26, which will be described later.

[0023] The voice acquisition unit 22 acquires the child's voice via the microphone 14a of the voice input / output unit 14 after the voice output unit 21 outputs a message. The acquired voice is converted into text information by speech recognition (ASR: Automatic Speech Recognition) processing. Here, "text information" is an example of information that expresses the content of the child's voice as text, but the present invention is not limited to this and can also include the generation and transmission of other forms of information that are generated based on the child's voice and help the guardian understand the child's state (for example, information indicating emotions extracted from the voice (corresponding to Appendix 19), information indicating the presence or absence of specific words, information indicating the tone and volume of the voice, etc.). The following mainly describes the case of conversion to text information, but these other information formats can be applied in the same way. This ASR processing may also be handled by the voice processing unit 26. The converted text information is transmitted to the guardian terminal 50 via the data transmission unit 25.

[0024] (Technical significance of character information conversion) The process of converting a child's voice into text information is preferably performed in the voice processing unit 26 of the monitored terminal 10 (see Appendix 2). This not only significantly reduces the amount of data flowing through the network 80, but also dramatically improves the level of privacy protection because it does not directly transmit raw voice data such as the content of conversations between the child and those around them. Furthermore, the generated text information can be used not only for display and reading aloud on the guardian terminal 50, but also as input data for additional information processing that is difficult or burdensome to process with raw voice, such as automatic detection of specific keywords (e.g., "pain," "help," which may be linked to the functions of the specific voice detection unit 60) or generation of information indicating the emotion of the voice (e.g., joy, sadness, anger), or it can be transmitted to the guardian terminal as a substitute or supplement to the text information. This enables advanced monitoring support that goes beyond simple message transmission.

[0025] (Variations of conversion location) It should be noted that the present invention is not necessarily limited to cases where the conversion process to text information is performed within the monitored terminal 10. For example, considering processing power and battery consumption, a configuration in which the monitored terminal 10 transmits the acquired voice data to the server 90, where the server 90 converts it to text information and distributes it to the guardian terminal 50 is also within the scope of the technical idea of the present invention. However, even in that case, the monitored terminal 10 plays the role of a voice acquisition unit by 'acquiring' voice data to be sent to the server 90 and finally initiating the process of sending it to the guardian terminal as 'text information'.

[0026] The phrase "convert to text information and transmit" typically refers to a case where the voice acquisition unit performs the conversion process within the monitored terminal (see Appendix 2), but is not limited to this. A configuration in which the voice acquisition unit acquires the child's voice, transmits the voice data to a server (server 90, etc.) that performs the conversion to text information, and the server transmits the converted text information to the guardian terminal 50 is also included in the technical concept of the present invention, as the entire system provides the guardian terminal with the child's voice as text information. In this case, the voice acquisition unit of the monitored terminal is responsible for initiating the conversion process to text information and acquiring and transmitting the voice data for that purpose.

[0027] The timeout monitoring unit 23 starts a timer after the audio output unit 21 outputs a message. If the audio acquisition unit 22 does not acquire (generate) character information within a predetermined time, the unit determines that a timeout has occurred.

[0028] (Definition of prescribed time) Here, "predetermined time" refers to the time to wait for a response from the child to a message from the parent or guardian. The starting point for this predetermined time can be, for example, the time when the audio output unit 21 completes outputting the message. Alternatively, variations are possible where the starting point is the time when the message receiving unit 20 receives the message or the time when the audio output unit 21 starts outputting. It is desirable that this predetermined time can be arbitrarily set and changed from the parent terminal 50 or the administrator's website (see Appendix 3). Furthermore, this "predetermined time" is not necessarily limited to the exact elapsed time (clock time) from the time the message output is completed. Depending on the content of the message from the parent or guardian and the settings, it is also possible to set "the period until a specific event occurs" (for example, the next scheduled time, arrival at a specific location, or instruction from the parent to manually check) as "predetermined time," and perform event-based monitoring that determines a timeout if there is no response within that period (see Appendix 25). In other words, "predetermined time" broadly means a reasonable period during which the parent or guardian expects a response from the child.

[0029] (Definition of cases where text information is not retrieved) "Cases where text information is not acquired" broadly includes situations in which a parent or guardian cannot receive a response from their child as text information, such as: (i) when the voice acquisition unit 22 fails to detect any response voice from the child within a predetermined time; (ii) when a response voice is detected but its content is unclear (e.g., high noise level, voice is too quiet) or cannot be recognized as meaningful language, and text information that meets the predetermined quality standards cannot be generated; or (iii) when text information is generated but for some reason transmission to the parent or guardian terminal is not completed. Furthermore, "not acquired" includes cases where the voice acquisition unit determines that it will not generate or transmit the text information that would ultimately be sent to the parent's device.

[0030] If a timeout is detected, the timeout monitoring unit 23 instructs the voice acquisition unit 22 to acquire ambient audio data, instructs the location information acquisition unit 24 to acquire location information, and transmits this data to the data transmission unit 25. The ambient audio data acquired here is limited in length to, for example, 5 seconds or more and 15 seconds or less (corresponding to Appendix 5).

[0031] (Flexibility of combinations and content of transmitted data) The timeout monitoring unit 23 is, in principle, preferable to transmit both ambient audio data and location information, but the present invention is not limited to this. Depending on the communication environment, battery level, parental settings (set by the policy control unit 34, see Figure 13), or detected situation (e.g., location information takes priority when a fall is detected), the invention may also include a configuration in which at least one of ambient audio data and location information is transmitted. Furthermore, the modes of transmitting "ambient audio data" may include not only cases where the raw audio data itself is transmitted, but also cases where only analysis information generated from the ambient audio data, such as volume classification information (Note 9) and sound type classification results (Note 10), is transmitted, or cases where both raw audio data and analysis information are transmitted. Moreover, importantly, in the event of no response, some information is provided to the parent to infer the child's situation. This "information for understanding the child's situation" may include not only ambient audio data and location information, but also image data or video data acquired by the imaging unit 16 (corresponding to Notes 23 and 24). Therefore, the technical concept of the present invention also includes modes in which, upon timeout, only image data or video data is transmitted, or image data and location information is transmitted, thereby transmitting visual information as the primary information without transmitting audio information. This enables the provision of information optimally according to privacy settings and circumstances. Accordingly, "transmitting ambient audio data and location information" broadly includes providing information obtained from these information sources (ambient audio, location, visual information, etc.) in an appropriate format (raw data, analysis results, or either one, etc.) according to a predetermined policy (see Figure 13) and circumstances when it is determined that situational awareness is necessary (regardless of whether that determination is due to timeout, AI prediction, or other trigger).

[0032] "Transmitting ambient audio data and location information" is merely a typical example of transmitting this information, and the present invention is not limited thereto. The timeout monitoring unit may transmit at least one of ambient audio data and location information, in addition to cases where both ambient audio data and location information are transmitted, depending on the communication status, battery level, privacy settings (including settings by the policy control unit 34), etc. Furthermore, the term "ambient sound data" is a concept that may include not only the raw sound data itself, but also information generated from that raw sound data that indicates the surrounding acoustic environment (volume, type of sound, etc.) (for example, volume classification information described in Appendix 9, sound type classification results described in Appendix 10, etc.). Therefore, the timeout monitoring unit may be configured to transmit this acoustic analysis information in place of, or in addition to, the raw sound data.

[0033] The location information acquisition unit 24 uses the GPS receiver unit 15 to acquire the current location information of the monitored terminal 10.

[0034] The data transmission unit 25 transmits to the parent terminal 50, via the communication unit 13, the character information generated by the voice acquisition unit 22, or ambient audio data, location information, volume classification information, sound type classification results, situation estimation results, image data, etc., acquired according to the instructions of the timeout monitoring unit 23 or the specific voice detection unit 60. Figure 10 shows an example of a transmitted data packet. This packet 100 may include a header 101, data type 102, timestamp 103, location information (latitude 104a, longitude 104b), ambient audio data (or reference information thereof) 105, volume classification information 106, sound type classification results 107, image data (or reference information thereof) 108, situation estimation results 109, footer 110, etc.

[0035] (Explicit indication of the data processing source) The data transmission unit 25 can include identification information in the data packet 100 (see Figure 10) sent to the guardian terminal 50, indicating which device (monitored terminal, server, etc.) generated the main data (e.g., text information, acoustic analysis results). This facilitates system operation verification and, if necessary, confirmation of processing details.

[0036] The voice processing unit 26 performs voice processing related to the voice output unit 21 and the voice acquisition unit 22, namely TTS processing and ASR processing. By performing these processes within the monitored terminal 10 (on-device processing), communication delays and costs with the server can be reduced, and responsiveness can be improved (see Appendix 2). Compared with typical prior art that provides voice AI functionality on the cloud side, this has advantages such as stable operation that is independent of the network environment and high privacy protection by not transmitting raw voice data outside the terminal.

[0037] The state management unit 27 manages the operating state of the monitored terminal 10 as a finite state machine (FSM). For example, by using a lightweight FSM with five or fewer states, such as "standby state," "message received state," "voice output state," "voice input standby state," and "fallback state," stable operation can be achieved even on the monitored terminal 10 with limited resources (corresponds to Appendix 6). Figure 5 shows an example of FSM state transitions. Unlike the technology that uses FSMs for detecting abnormalities in biosignals (Patent Document 1), this manages complex state transitions in the context of voice communication, such as message reception, voice output, response waiting, and timeout, and is particularly effective in improving the reliability and stability of the dialogue sequence.

[0038] The encryption unit 28 encrypts the ambient audio data transmitted by the timeout monitoring unit 23. For example, a standard encryption method such as AES-128 is used. Furthermore, from the standpoint of protecting privacy, ambient audio data that has been transmitted or stored on a server, etc., is managed to be automatically deleted, for example, after 24 hours (see Appendix 7).

[0039] The power saving control unit 29 performs control to suppress battery consumption of the monitored terminal 10. In particular, after the timeout monitoring unit 23 transmits ambient voice data and location information (after fallback), it extends the positioning interval by the GPS receiver unit 15 to a longer duration than normal to reduce power consumption (correspond to Appendix 8).

[0040] The volume classification unit 30 analyzes the volume level (such as the average sound pressure level) based on the ambient sound data acquired by the timeout monitoring unit 23 or the sound acquisition unit 22, and generates volume classification information in approximately three stages, such as "quiet," "medium noise," and "loud noise." This volume classification information is added to the ambient sound data and transmitted to the parent terminal 50 (corresponding to Appendix 9).

[0041] The sound type classification unit 31 analyzes ambient sound data and classifies its content into categories such as "human voice," "traffic noise," "environmental noise," and "quiet sound." This classification result is transmitted to the parent terminal 50 along with location information (corresponding to Appendix 10). Furthermore, it is also possible to transmit this classification result as text information to the parent terminal 50 and have the parent terminal 50 output the audio using the TTS function (corresponding to Appendix 11).

[0042] (Effect of supporting understanding of the situation) Thus, presenting volume classification information (corresponding to Appendix 9) and sound type classification results (corresponding to Appendix 10) to the parent terminal 50 in combination with location information not merely transmits data, but also produces a remarkable effect not found in conventional technologies: it supports the parent's "qualitative understanding of the situation." For example, a parent can instantly grasp "contextual information" from the received information, such as "being at school (location information), quiet (volume classification), and no human voices (sound type classification)," and infer that "it is highly likely that a class is in session." Compared to conventional technologies where parents need to actively interpret the situation by directly listening to ambient sound data, this significantly reduces the cognitive burden on parents, enabling them to quickly gain a sense of security or detect abnormalities early, even in situations where there is no response. In particular, the fact that these analysis results (including the text notification in Appendix 11) enable objective situational awareness even when parents cannot hear the sound is extremely useful in practice. Figure 11 shows an example of the screen display of the parent terminal 50, where a child's location marker 202 is displayed on the map 201, and a text information area 203, an audio playback UI 204, a volume classification display 205 (e.g., a level meter or icon), a sound type classification display 206 (e.g., an icon or the text "human voice"), and a received image thumbnail 207 are arranged.

[0043] The image acquisition unit 32 not only controls the imaging unit 16 to capture an image of the surroundings and transmit the image data along with the ambient sound data and location information when the timeout monitoring unit 23 transmits ambient sound data and location information (corresponding to Appendix 12), but may also have a function to acquire image or video data (preferably silent video without sound) as primary fallback information instead of ambient sound data and location information, and transmit it to the data transmission unit 25 (corresponding to Appendix 23).

[0044] The specific voice detection unit 60 continuously or intermittently acquires sound from the microphone 14a of the voice input / output unit 14 and detects specific voices (wake words, such as "help") that have been registered in advance. When a specific voice is detected, it actively acquires ambient sound data and location information without waiting for the timeout monitoring unit 23 to time out, and transmits them to the parent terminal 50 via the data transmission unit 25 (corresponding to Appendix 4). This processing sequence is shown in Figure 12. The specific voice detection unit 60 performs sound monitoring (S401), and when it detects a specific voice (S402), it issues a transmission instruction to the data transmission unit 25 (S403), and ambient sound data and location information (and optionally images) are transmitted to the parent terminal 50 (S404). "Specific voices" may include words indicating urgency (e.g., help, it hurts, it's dangerous), or any words or voice patterns that have been pre-registered by a guardian.

[0045] The situation estimation unit 33 is installed in the monitored terminal 10 or server 90 and is equipped with AI (artificial intelligence) that comprehensively analyzes acquired ambient sound data, volume classification information, sound type classification results, location information, time information, and activity data obtained from acceleration sensors, past behavior pattern data, etc. Based on this multifaceted information, the situation estimation unit 33 estimates the child's situation at a higher level, such as "in class," "walking to school," "playing in the park," or "already home," and transmits the estimation result to the guardian terminal 50 (corresponding to appendices 17 and 18). Unlike general prior art that simply converts speech to text, this is unique in that it integrates multiple information sources to provide high-value-added information such as "understanding the situation" and "predicting risks." Furthermore, the situation estimation unit 33 may have a predictive fallback function that calculates a risk score for the child's current situation from this information, and proactively transmits ambient audio data, location information, and the risk score and its rationale information to the parent terminal 50 without waiting for a timeout to occur by the timeout monitoring unit 23 if this score exceeds a predetermined threshold or rises sharply (corresponding to Appendix 20). In this case, the AI exceeding the risk score threshold can function as one of the main triggers that necessitate parental awareness, alongside or encompassing "when text information is not acquired within a predetermined time." This allows parents to grasp a more likely situation without having to infer from fragmented information. As for AI algorithms, for example, CNNs (Convolutional Neural Networks) and RNNs (Recurrent Neural Networks) can be used for speech recognition and acoustic event detection, and Transformer-based models that combine this information with time-series data can be used for behavior estimation.

[0046] For example, if the input features are "human voices (multiple)" and "low ambient noise" as sound type classification results, "active" as acceleration sensor data, "16:00" as the time, and "XX Park" as the location information, the situation estimation unit 33 may input these features into a trained Transformer model and generate an output of "playing with friends in the park" (estimated probability 90%) and a risk score of "15 (low)".

[0047] (Gradual Disclosure and Privacy) This system provides a step-by-step information disclosure mechanism, consisting of normal communication via "voice-to-text conversion" (S101-S109) and, in the event of no response, the transmission of "analysis results (volume classification, sound type classification, situation estimation results, etc.) and optionally ambient sound data and images" (S110-S114). This design ensures maximum protection of the child's privacy during normal times, while allowing guardians to obtain necessary information step-by-step and efficiently when an abnormality is suspected. Notification of AI-generated situation estimation results further enhances this step-by-step information disclosure and is effective in balancing privacy protection with the accuracy of situation assessment.

[0048] (Description of the server's functional block structure) As shown in Figure 1, the server 90 is a component of System 1, and an example of its hardware configuration is shown in Figure 14. Functionally, the server 90 can include a communication interface unit (corresponding to the communication interface 94 in Figure 14) that communicates with the monitored terminal 10 and the guardian terminal 50, a data storage unit (corresponding to the storage 93 and memory 92 in Figure 14) that temporarily or permanently stores received messages and data, a message relay unit that relays messages between terminals, a data processing unit that performs AI analysis and acoustic analysis based on ambient sound data and other sensor information transmitted from the monitored terminal 10, and a policy management unit that manages the monitoring policy set by the guardian and controls the fallback operation based on that policy. These functional units are realized by the control unit of the server 90 (corresponding to the processor 91 in Figure 14) executing a program loaded into memory 92. As mentioned above, it is also possible for the server 90 to execute some of the functional blocks of the monitored terminal 10 (for example, timeout monitoring, situation estimation, speech recognition, etc.).

[0049] (Description of the server's hardware configuration) Figure 14 is a block diagram showing the hardware configuration of a server 90 according to one embodiment of the present invention. The server 90 is typically configured as a computer system comprising one or more processors 91, memory 92 as main memory, storage 93 as auxiliary storage (e.g., a hard disk drive or solid-state drive), and a communication interface 94 for connecting to a network 80. These components are interconnected by a system bus 95. The processor 91 controls the operation of the entire server 90 by executing the operating system and application programs loaded into the memory 92, thereby realizing the aforementioned message relay function, data storage function, data processing function (including AI analysis, acoustic analysis, etc.), policy management function, etc.

[0050] (Explanation of the processing flow) Figure 4 shows the basic processing sequence in this embodiment. First, a message is sent from the guardian terminal 50 to the monitored terminal 10 (S101). This message may be relayed via the server 90. The message receiving unit 20 of the monitored terminal 10 receives it (S102), and the audio output unit 21 outputs the message as audio (S103).

[0051] Next, the voice acquisition unit 22 enters a voice input standby state (S104), and the timeout monitoring unit 23 starts a timer (S105). When the child makes a response voice (S106), the voice acquisition unit 22 acquires it and converts it into text information (S107). The data transmission unit 25 transmits this text information to the parent terminal 50 (S108). This transmission may also be performed via the server 90. The parent terminal 50 displays the received text information (S109).

[0052] On the other hand, if the character information in S107 is not generated within a predetermined time after the timer is started in S105, the timeout monitoring unit 23 determines that a timeout has occurred (S110). The timeout monitoring unit 23 instructs the voice acquisition unit 22 to acquire ambient sound data (S111) and the location information acquisition unit 24 to acquire location information (S112). Furthermore, the volume classification unit 30, sound type classification unit 31, situation estimation unit 33, etc., perform their respective processes and generate related information. Then, the data transmission unit 25 transmits the ambient sound data, location information, and the analysis and estimation results thereof to the parent terminal 50 (S113). This transmission may also be performed via the server 90. The parent terminal 50 outputs and displays the received information (S114).

[0053] Figure 6 is a flowchart of the volume classification process (Note 9). When the timeout monitoring unit 23 or the sound acquisition unit 22 acquires ambient sound data (S201), the volume classification unit 30 calculates the average sound pressure level (dB) of that sound data (S202). If the calculated level is less than the first threshold (e.g., 40 dB), it is classified as "quiet" (S203); if it is above the first threshold but below the second threshold (e.g., 70 dB), it is classified as "medium noise" (S204); and if it is above the second threshold, it is classified as "loud noise" (S205), and classification information is generated (S206).

[0054] Figure 7 is a flowchart of the sound type classification process (Appendix 10). When ambient sound data is acquired (S301), the sound type classification unit 31 extracts sound features (MFCC, etc.) (S302) and determines the sound type using a pre-trained classification model (e.g., neural network) (S303). The determination result (human voice, traffic noise, ambient noise, quiet sound, etc.) is generated as the classification result (S304).

[0055] (Variations, etc.) The present invention is not limited to the embodiments described above. For example, each of the functions described above (on-device processing, numerical range, wake word, recording length, FSM, privacy, battery optimization, dB label, sound type classification, photo attachment, AI situation estimation, etc.) can be adopted independently or in any combination.

[0056] (Various fallback triggers and policy controls) The triggers for the fallback function (transmission of ambient sound data, etc.) are not limited to timeouts (Note 1) or detection of specific sounds (Note 4). It is also possible to configure the fallback function to be activated when the AI exceeds a risk score threshold (corresponding to Note 20), or when a specific event set by a parent occurs (e.g., arrival of a scheduled time such as "3 PM", entry into or exit from a specific area such as "entering the school area" or "departing from the home area" due to the geofencing function, the next periodic positioning timing, or receipt of a manual check instruction signal from a parent) (corresponding to Note 25), and no response to a message sent before that time has been confirmed. These event-based triggers can be positioned as a concretization or expansion of the concept of "predetermined time". The term "specified time" encompasses not only the passage of time according to the clock time, but also the period from the time the message is sent until a specific event occurs that the parent is expected to respond to (such as a scheduled time, entry into or exit from a specific area, or instructions from the parent). The timeout monitoring unit may be configured to monitor either this time or event and to trigger a fallback if text information is not acquired within that period.

[0057] Furthermore, the information transmitted when these various triggers are detected does not necessarily have to be uniform. The monitored terminal 10 may also be configured to include a policy control unit 34 (see Figure 13) that dynamically determines the type, combination, or level of information to be transmitted (e.g., raw voice or analysis results) based on a "monitoring policy" set in advance by the guardian (see Appendix 21).

[0058] As shown in Figure 13, the policy control unit 34 stores the monitoring policy 301, which is set from the parent terminal 50 or a web interface, in the storage unit 12, etc. This monitoring policy 301 defines input conditions such as time period 302, location information 303 (geofence, etc.), detected trigger type 304, and risk score 305 calculated by AI, and the corresponding information set 306 to be transmitted (e.g., location information only, voice analysis results + location information, all information, etc.). When the timeout monitoring unit 23 or the situation estimation unit 33 detects a fallback trigger, the policy control unit 34 compares the current situation (time, location, risk score, etc.) with the monitoring policy 301, determines the information set 306 to be transmitted, and instructs the data transmission unit 25. For example, it is possible to finely control the level of privacy by transmitting only the situation estimation results by AI, or only location information. This enables flexible monitoring that responds to a wider variety of risk scenarios and the needs of parents.

[0059] (Application to non-voice interfaces) In this embodiment, the exchange of messages and responses using a voice interface has been mainly described. However, one of the core features of the present invention, the "fallback function that automatically sends information to understand the child's situation (such as ambient sound data and location information) when there is no response from a parent within a predetermined time," does not necessarily presuppose a voice interface. For example, the monitored terminal 10 may be configured to include a notification unit (not shown) that notifies the child of a message from the parent terminal 50 by screen display, vibration pattern, or flashing light, and a response receiving unit (not shown) that receives responses from the child by button operation on the terminal, icon selection on the touchscreen, or simple gesture input (corresponding to Appendix 22). In such a configuration, if a response is not received by the response receiving unit within a predetermined time after notification by the notification unit, a monitoring unit having the same function as the timeout monitoring unit 23 can be configured to send ambient sound data and location information to the parent terminal 50. In this case, "when text information is not acquired" can be more broadly interpreted as "when a response from the child in the format expected by the parent is not obtained within a predetermined time." This allows us to provide the main benefit of the present invention—understanding the situation when there is no response—even in situations or for users where voice control is difficult.

[0060] (Second embodiment: Non-voice interface) In the first embodiment described above, message exchange using a voice interface (voice output unit 21, voice acquisition unit 22) and a fallback function based thereon were mainly explained. However, the core feature of the present invention, "a fallback function that automatically sends information to understand the child's situation when there is no response from a parent for a predetermined time," is not necessarily limited to a voice interface. In this second embodiment, a child monitoring information provision system mainly using a non-voice interface that notifies the child and receives responses from the child by means other than voice will be described. Such a configuration is important in that it provides an effective monitoring function of the present invention even in environments with high noise levels, situations where the child cannot speak, or for children with hearing or speech difficulties.

[0061] The monitored terminal 10A (not shown, the basic hardware configuration may be the same as in Figure 2) according to this second embodiment includes a notification unit (not shown) for notifying the child of a message received from the guardian terminal 50, a response receiving unit (not shown) for receiving a response from the child, and a monitoring unit (functionally including the time-out monitoring unit 23 and related control functions of the first embodiment) that transmits information to the guardian terminal 50 to understand the child's situation if the response is not received by the response receiving unit within a predetermined time after notification by the notification unit. These functional units are realized by the control unit 11 of the monitored terminal 10A executing a program in the storage unit 12.

[0062] (Notification Department) The notification unit presents messages received from the parent terminal 50 (such as text, stamps, images, or simple "response confirmation" instructions) to the child in a perceptible format. Specific notification methods include the following: Screen display: If the monitored device 10A has a display, the message content will be displayed as text or an image. Adding animations or flashing displays can more effectively capture the child's attention. Vibration: The vibration motor built into the monitored terminal 10A is used to vibrate in a specific pattern. For example, the vibration pattern can be changed according to the type of message (e.g., urgency). LED illumination: The LEDs mounted on the monitored terminal 10A will light up or blink. The type of message can also be indicated by color or blinking pattern. Sound (other than voice): Simple warning and notification sounds other than voice messages, such as buzzer sounds and chime sounds, are output from speaker 14b. These notification methods may be used individually or in combination. For example, vibrating simultaneously with a screen display can more reliably alert children to the arrival of a message. The notification unit operates in conjunction with the message receiving unit 20.

[0063] (Response reception department) The aforementioned response receiving unit is an interface for detecting or receiving responses from children. Unlike the voice acquisition unit 22, it primarily receives responses by non-verbal means. Specific response receiving means include the following: Physical buttons: The system detects the pressing of one or more physical buttons provided on the monitored terminal 10A. For example, simple responses such as "OK" or "I saw it" can be assigned to specific buttons. Different responses can also be expressed through operations such as long-pressing or multiple presses of a button. UI elements on a touchscreen: If the display is a touchscreen, it accepts tap or swipe actions for buttons (e.g., "Yes," "No," "Later"), sliders, icons, etc. displayed on the screen. Gesture Recognition: Using the imaging unit 16, accelerometer, gyroscope, etc., the device recognizes specific gestures of children (e.g., waving, shaking the device, nodding) and accepts them as responses. In this case, AI technologies (image recognition, behavior recognition) such as those used in the situation estimation unit 33 can be applied. Specific motion detection using an accelerometer: For example, the accelerometer can detect actions such as tapping the device a specific number of times or shaking it in a specific pattern, and accept these actions as responses. NFC / RFID and other proximity communication: These systems accept the action of holding a device near a specific tag or card as a response. These response receiving means can also be used individually or in combination. When the response receiving unit detects a response operation from a child, it transmits the response content (e.g., button ID, tap coordinates, recognized gesture type) to the parent terminal 50 via the data transmission unit 25. At this time, the response content may be converted into a predefined "standard message" before transmission.

[0064] (Monitoring Department) The monitoring unit plays the same role as the timeout monitoring unit 23 in the first embodiment. That is, after the notification unit notifies the child of a message, it starts a timer, and if the response receiving unit does not receive a response from the child within a predetermined time (hereinafter referred to as "non-response state"), it determines that a timeout has occurred. The definition of this "predetermined time," the starting point, and the event-based trigger (see Appendix 25) are similar to those described in the first embodiment.

[0065] If the monitoring unit determines that the child is unresponsive, it instructs the parent terminal 50 to send information to understand the child's situation. The "information to understand the child's situation" sent here includes at least one of the following: ambient audio data, location information, or image data, as defined in claim 19. Specifically, which information to send can be dynamically determined by the policy control unit 34 (see Figure 13) described in the first embodiment, based on the monitoring policy 301. For example, the following processing may be performed. If the policy setting prioritizes privacy: Only location information, or only the results of the surrounding acoustic analysis (volume classification information 106, sound type classification result 107), will be transmitted. If the policy setting prioritizes situational awareness: In addition to location information, ambient audio data 105 (e.g., 5 to 15 seconds) and ambient image data 108 or short video data acquired by the imaging unit 16 are transmitted. If the AI-based situation estimation unit 33 is available: The estimated child's situation 109 and risk score are also transmitted. Thus, even in the non-voice interface embodiment, the information transmitted during fallback can be configured flexibly, just as in the case of the voice interface. The microphone 14a of the voice input / output unit 14 is responsible for acquiring ambient sound data, the GPS receiver 15 is responsible for acquiring location information, and the imaging unit 16 is responsible for acquiring image data.

[0066] (Example of a processing sequence in a non-voice interface) The basic processing sequence in this second embodiment is similar to the sequence for the voice interface shown in Figure 4. 1. Send a message from the parent device 50 to the monitored device 10A (S101 compatible). 2. The message receiving unit 20 of the monitored terminal 10A receives the message (S102 compatible). 3. The notification unit notifies the child of the message (modified S103). 4. The response receiving unit enters a response waiting state, and the monitoring unit starts the timer (corresponding to S104 and S105). 5. When a child performs a response operation (modified S106), the response receiving unit detects it and generates response information (modified S107). 6. The data transmission unit 25 sends response information to the parent terminal 50 (S108 compatible). 7. Display the response information received by the parent terminal 50 (compatible with S109). 8. On the other hand, if the response is not received by the response receiving unit within a predetermined time after the timer starts, the monitoring unit determines that a timeout has occurred (corresponding to S110). 9. The monitoring unit, in cooperation with the policy control unit 34 (if available), determines the "information for understanding the child's situation" (such as ambient audio data, location information, and image data) to be transmitted, and instructs each acquisition unit (audio acquisition unit 22, location information acquisition unit 24, image acquisition unit 32, etc.) to acquire the information (S111, S112 modified). 10. The data transmission unit 25 transmits the acquired information to the parent terminal 50 (S113 compatible). 11. The parent terminal 50 outputs and displays the received information (S114 compatible).

[0067] (Technical significance and effects of non-voice interfaces) By adopting a non-voice interface as in this second embodiment, the following effects can be expected. Use in noisy environments: Even in noisy surroundings, notifications can be displayed on the screen, transmitted via vibration, and responded to via button operation. Supporting users with voice control difficulties: Making the monitoring system easier to use for children who have difficulty speaking or who have hearing impairments. Intuitive operation: Especially for young children, simple button operations or icon taps can sometimes be more intuitively understandable than voice commands. Privacy considerations: Even if you are hesitant to speak in public places, you can respond silently. Functionality expandability: By using gesture recognition and motion detection via accelerometers as response methods, a wider variety of interactions can be achieved.

[0068] In this second embodiment, as described in the first embodiment, each functional block, such as the specific voice detection unit 60 (however, in this case, the microphone must be kept on at all times or combined with other triggers), the situation estimation unit 33, the policy control unit 34, the encryption unit 28, the power saving control unit 29, the volume classification unit 30, the sound type classification unit 31, and the image acquisition unit 32, can be used in appropriate combinations. For example, by including the estimation results and risk score from the situation estimation unit 33 in the information transmitted when the system becomes unresponsive, parents can gain a deeper understanding of the situation. Similarly, privacy protection can be ensured by applying encryption and automatic deletion of transmitted data (see Appendix 7).

[0069] Thus, the fallback function of the present invention is not limited to voice interfaces, but can also be implemented in systems where the notification means and response receiving means are replaced with non-voice means. This makes it possible to provide a child monitoring information provision system that can accommodate a wider range of usage scenarios and user groups.

[0070] (Variation: Information provision mode) The parent terminal 50 or administrator interface can be equipped with an "information provision mode" for checking the system's operating status. In this mode, metadata (e.g., data type 102, timestamp 103, event that triggered fallback (timeout, specific voice, AI risk detection, etc.), applied policy ID, data processing source identification information) contained in the data packet 100 (see Figure 10) received from the monitored terminal 10 can be displayed on the information display unit 53. This allows parents and administrators to understand the system's operation in detail and indirectly obtain clues to confirm the system's internal operation (likelihood of intrusion) from the outside.

[0071] (Torture: Server Report) The server 90 may have a function to aggregate and analyze the operation logs of each monitored terminal 10 and periodically (or at the request of the guardian) send a summary report to the guardian terminal 50. This report may include the number of times the fallback function has been activated, the main triggers (timeout, specific voice, AI risk), the main sound type classification results, and a graph showing the trend of the risk score. This allows the guardian to get an overview of the child's situation and can also serve as circumstantial evidence that the server 90 is performing the main functions described in the claim.

[0072] Furthermore, the embodiments and modifications disclosed herein may be combined, omitted, or replaced with other elements or steps as appropriate, without departing from the spirit thereof.

[0073] [General tasks] To make it easier for parents to understand their child's situation when supervising them. In child monitoring, the aim is to enable parents to understand their child's situation in more detail and quality, to effectively reduce parental anxiety, especially when there is no response from the child, to reduce the cognitive burden on parents, and to provide information in a step-by-step manner while respecting the child's privacy.

[0074] [Issues related to Appendix 1] To provide a basic mechanism for understanding the situation and reducing anxiety when there is no response in child monitoring. [Note 1] A child monitoring information provision system comprising a monitored terminal, a guardian terminal, and a wireless communication unit for communication between the two terminals, wherein the monitored terminal comprises an audio output unit that outputs messages received from the guardian terminal as audio, an audio acquisition unit that acquires the child's voice, converts it into text information and transmits it to the guardian terminal, and a timeout monitoring unit that transmits ambient audio data and location information to the guardian terminal if the text information is not acquired within a predetermined time. (Effects of Appendix 1) By automatically transmitting ambient sounds and location information when there is no response, parents can infer the situation and reduce their anxiety.

[0075] [Issues related to Appendix 2] To reduce communication load and improve responsiveness. [Note 2] The system described in Note 1, wherein the voice output unit and the voice acquisition unit include a voice processing unit that operates within the monitored terminal. (Effects of Appendix 2) On-device processing reduces communication delay and cost, and improves responsiveness. "Including an audio processing unit operating within the monitored terminal" means that the main processing related to audio output or audio acquisition, such as basic TTS and ASR, is performed within the terminal. Even if some advanced processing is performed in cooperation with a server, this configuration may still apply if audio input / output and basic conversion processing are performed on the terminal side.

[0076] [Issues related to Appendix 3] Setting an appropriate timeout period. [Note 3] The system described in Note 1, wherein the predetermined time is 10 seconds or more and 20 seconds or less. (Effects of Appendix 3) It is possible to balance the time spent waiting for the child's response with the time when the parent begins to feel anxious. These numerical ranges represent typical examples that consider the balance between a child's response time, the amount of information needed to understand the situation, and privacy protection. Depending on the system's operating policy and parental settings, values outside these ranges may also be included in the technical concept of the present invention.

[0077] [Challenges related to Appendix 4] Actively detecting specific situations, such as emergencies. [Note 4] The system described in Note 1, wherein the system transmits the ambient sound data and location information when a specific sound is detected. (Effect of Appendix 4) Wake word detection allows for notification of emergency situations without waiting for a timeout.

[0078] [Issues related to Appendix 5] Reduce the amount of ambient audio data transmitted and take privacy into consideration. [Note 5] The system described in Note 1, wherein the ambient audio data is an audio recording of 5 seconds or more and 15 seconds or less. (Effects of Appendix 5) By transmitting only the minimum necessary voice data, the amount of data and the impact on privacy can be reduced. These numerical ranges represent typical examples that consider the balance between a child's response time, the amount of information needed to understand the situation, and privacy protection. Depending on the system's operating policy and parental settings, values outside these ranges may also be included in the technical concept of the present invention.

[0079] [Issues corresponding to Appendix 6] To stabilize the operation of the terminal and utilize resources efficiently. [Note 6] The system described in Note 1, wherein the monitored terminal is a finite state machine with 5 or fewer states that manages the voice output unit and the voice acquisition unit. (Effects of Appendix 6) The lightweight FSM enables stable operation even on terminals with limited resources.

[0080] [Issues related to Appendix 7] Protecting the privacy of transmitted voice data. [Note 7] The system described in Note 1, wherein the ambient audio data is encrypted using the AES-128 method and automatically deleted after 24 hours. (Effects of Appendix 7) Encryption and automatic deletion ensure the privacy of voice data.

[0081] [Issues related to Appendix 8] Suppress battery consumption after fallback. [Note 8] The system described in Note 1, wherein power consumption is reduced by extending the positioning interval after transmitting the ambient sound data and location information. (Effects of Appendix 8) By extending the positioning interval, battery life can be extended.

[0082] [Issues corresponding to Appendix 9] To provide a simple way to grasp the surrounding noise level and support understanding of the situation. [Note 9] The system described in Note 1, which adds and transmits volume classification information that divides the average sound pressure level into three stages: quiet, medium noise, and loud noise, based on the ambient sound data. (Effects of Appendix 9) The volume classification information allows parents to understand the noise situation without directly listening to the sound, contributing to contextual understanding and reduced cognitive load.

[0083] [Challenges corresponding to Appendix 10] To understand the types of sounds in the surroundings, deepen situational awareness, and provide contextual information. [Note 10] The system described in Note 1, which classifies the ambient sound data into one of the following: human voice, traffic noise, environmental noise, or silence, and transmits the classification result together with the location information. (Effects of Appendix 10) Sound type classification allows parents to more concretely infer the environment in which their child is, contributing to contextual understanding and reduced cognitive load.

[0084] [Issues related to Appendix 11] To provide parents with easily understandable information regarding the sound type classification results. [Note 11] The system described in Note 10, wherein the classification result is transmitted as text information to the parent terminal and the parent terminal outputs it as audio. (Effects of Appendix 11) By notifying parents of the classification results by voice, they can understand the situation without looking at the screen.

[0085] [Challenges corresponding to Appendix 12] To improve the accuracy of situational awareness by adding visual information. [Note 12] The system described in Note 1, wherein when the timeout monitoring unit transmits the ambient audio data and location information, it also transmits the image acquired by the monitored terminal. (Effects of Appendix 12) Image information allows parents to visually confirm the surrounding situation, increasing their sense of security.

[0086] [Challenges related to Appendix 13] To provide a program that implements the basic functions of the monitored terminal. [Note 13] A program that causes a computer to output messages received from a parent terminal as audio, acquire the child's voice, convert it into text information and send it to the parent terminal, and if the text information is not acquired within a predetermined time, send ambient audio data and location information to the parent terminal. (Effects of Appendix 13) The program allows a general-purpose computer to function as a monitored terminal.

[0087] [Issues related to Appendix 14] To provide a method for implementing basic processing of the monitored terminal. [Note 14] An information provision method that includes the steps of receiving a message from a parent's terminal and outputting it as audio, acquiring the child's voice and sending it to the parent's terminal as text information, and sending ambient audio data and location information to the parent's terminal if the text information is not acquired within a predetermined time. (Effects of Appendix 14) The information provision method defines a series of processes on the monitored terminal.

[0088] [Challenges related to Appendix 15] To provide a program that implements the basic functions of a parental device. [Note 15] A program that causes a computer to send a message to a monitored terminal and to output text information, ambient sound data, location information, volume classification information, sound type classification results, situation estimation results, or image data received from the monitored terminal. (Effects of Appendix 15) The program allows a general-purpose computer to function as a parental control terminal.

[0089] [Issues related to Appendix 16] To provide a method for implementing basic processing on parental devices. [Note 16] An information processing method that includes the steps of sending a message to a monitored terminal and outputting text information, ambient sound data, location information, volume classification information, sound type classification results, situation estimation results, or image data received from the monitored terminal. (Effects of Appendix 16) The information processing method defines a series of processes on the parent terminal.

[0090] [Issues related to Appendix 17] To provide a monitoring terminal that offers a basic mechanism for understanding the situation and reducing anxiety when there is no response in the case of child monitoring. [Note 17] A monitored terminal comprising: an audio output unit that outputs messages received from a parent terminal as audio; an audio acquisition unit that acquires the child's voice, converts it into text information, and sends it to the parent terminal; and a timeout monitoring unit that sends ambient audio data and location information to the parent terminal if the text information is not acquired within a predetermined time. (Effects of Appendix 17) As a standalone monitored device, if there is no response, it automatically transmits ambient sound and location information, allowing guardians to infer the situation and reduce anxiety.

[0091] [Issues related to Appendix 18] To provide a server that relays information between terminals and provides processing support in a child monitoring system. [Note 18] A server that is communicatively connected to a monitored terminal and a guardian terminal, and receives from the monitored terminal response information in which the child's voice has been converted into text information, or ambient sound data and location information transmitted when the monitored terminal does not acquire the response information within a predetermined time, and transmits the response information, or the ambient sound data and location information, to the guardian terminal. (Effects of Appendix 18) The presence of a server allows for efficient management of communication between terminals and enables additional information processing as needed.

[0092] [Challenges corresponding to Appendix 19] To easily grasp a child's emotional state. [Note 19] The system described in Note 1, wherein the voice acquisition unit extracts information indicating emotions from the child's voice and transmits the information indicating emotions to the parent terminal. (Effects according to Appendix 19) Emotional tags allow parents to intuitively understand their child's emotional state.

[0093] [Challenges related to Appendix 20] Proactively notifying the situation based on AI predictions. [Note 20] A system or server as described in Note 1 or Note 18, wherein the monitored terminal or the server further comprises a situation estimation unit that calculates a risk score regarding the child's situation, and when the risk score exceeds a predetermined threshold, the time-out monitoring unit (or associated transmission control unit) transmits the ambient voice data and location information, or information based thereon. (Effect of Appendix 20) Risk indicators can be detected and notifications proactively sent without waiting for a timeout.

[0094] [Issues related to Appendix 21] To flexibly control the information transmitted in accordance with the needs and privacy policies of parents. [Note 21] A system or server as described in Note 1 or Note 18, further comprising a policy control unit that controls the type or content of information transmitted by the timeout monitoring unit (or related transmission control unit, or corresponding processing unit of the server) based on a monitoring policy set by the guardian. (Effects of Appendix 21) Flexible and privacy-conscious monitoring tailored to the circumstances of each household becomes possible.

[0095] [Issues corresponding to Appendix 22] Regardless of the type of interface, understand the situation when there is no response from the child. [Note 22] A child monitoring information provision system comprising a monitored terminal, a guardian terminal, and a wireless communication unit for communication between the two terminals, wherein the monitored terminal comprises a notification unit for notifying the child of a message received from the guardian terminal, a response receiving unit for receiving responses from the child, and a monitoring unit for transmitting information (including at least ambient sound data or location information) to the guardian terminal for understanding the child's situation if the response receiving unit does not receive a response within a predetermined time. (Effects of Appendix 22) Not limited to voice interfaces, it can provide a fallback function in the event of no response in a variety of systems.

[0096] [Issues related to Appendix 23] To visually grasp the situation while respecting voice privacy. [Note 23] The system described in Note 1, wherein the timeout monitoring unit transmits image or video data acquired by the monitored terminal, instead of, or in addition to, the ambient audio data and location information. (Effects of Appendix 23) The child's situation can be understood through visual information without transmitting the content of surrounding conversations.

[0097] [Challenges corresponding to Appendix 24] Select and transmit the most appropriate information (audio, location, visual) depending on the situation. [Note 24] The system described in Note 1, wherein the timeout monitoring unit selects at least one from ambient audio data, location information, and image or video data and transmits it to the guardian terminal. (Effects of Appendix 24) The most appropriate information can be provided depending on the communication status and privacy settings.

[0098] [Challenges corresponding to Appendix 25] Confirm responses at more appropriate times, linked to the child's actions and schedule, and implement fallbacks. [Note 25] The system described in Note 1, wherein the predetermined time includes the period from the sending of a message by the guardian until a predetermined event occurs, and the timeout monitoring unit transmits the ambient audio data and location information, or information based thereon, if the text information is not acquired by the time the predetermined event occurs. (Effects of Appendix 25) Rather than simply relying on the passage of time, it allows for response confirmation and fallbacks to be performed at appropriate timings according to the child's situation.

[0099] [Challenges related to Appendix 26] To implement the system using general-purpose devices without requiring dedicated terminals. [Note 26] The system described in Note 1, wherein the monitored terminal and the guardian terminal are each implemented by programs running on a general-purpose computer device. (Effects of Appendix 26) The system can be easily and inexpensively implemented using smartphones, etc.

[0100] [Challenges related to Appendix 27] To realize a flexible and scalable system by utilizing the cloud. [Note 27] The system described in Note 1, wherein communication between the monitored terminal and the guardian terminal is conducted via a cloud server. (Effects of Appendix 27) Access from multiple devices and server-side functionality expansion become easier.

[0101] [Note 28] The system comprises a monitored terminal, a guardian terminal, and a wireless communication unit for communication between the two terminals. The aforementioned monitoring terminal is A notification unit that notifies the child of messages received from the parent's device, A response reception desk that receives responses from children, If, after notification by the notification unit, no response is received by the response receiving unit within a predetermined time, the monitoring unit transmits information to the guardian terminal to understand the child's situation (including at least ambient sound data, location information, or image data). A child monitoring and information provision system equipped with the necessary features. The response reception unit may also detect not only voice but also buttons, touch panels, gestures, etc., to receive responses from children. [Explanation of Symbols]

[0102] 1. Child Monitoring Information Provision System 10. Monitoring device 11 Control Unit 12 Storage section 13 Communications Department 14. Audio Input / Output Section 14a Microphone 14b Speaker 15 GPS receiver 16 Imaging Unit 20 Message receiving unit 21 Audio output section 22 Voice acquisition unit 23 Timeout Monitoring Department 24 Location information acquisition section 25 Data transmission unit 26. Audio Processing Unit 27 Status Management Department 28 Encryption section 29 Power saving control unit 30 Volume division section 31 Sound type classification section 32 Image acquisition unit 33 Situation Estimation Unit 34 Policy Control Unit 50 Parental devices 51 Message sending section 52 Data receiving unit 53 Information display section 54 Audio Output Control Unit 55 Communication Control Unit 56 (Parental device) control unit 57 (Parental device) storage unit 60 Specific Voice Detection Unit 80 Networks 90 Cloud Servers 91 Processors (Servers) 92 memory (server) 93 Storage (Server) 94. Communication Interface (Server) 95 System bus (server) 100 data packets 101 Header 102 Data Types 103 Timestamp 104a Location information (latitude) 104b Location information (longitude) 105 Ambient audio data (or reference information) 106 Volume Classification Information 107 Sound type classification results 108 Image data (or its reference information) 109 Situation Estimation Results 110 footer 201 Map 202 Position Markers 203 Character Information Area 204 Audio Playback UI 205 Volume classification display 206 Sound type classification display 207 Received image thumbnails 301 Monitoring Policy 302 Time Zone (Policy Input) 303 Location Information (Policy Input) 304 Trigger Type (Policy Input) 305 Risk Score (Policy Input) 306 Transmission Information Set (Policy Output)

Claims

1. The system comprises a monitored terminal, a guardian terminal, and a wireless communication unit for communication between the two terminals. The aforementioned monitoring terminal is An audio output unit that outputs messages received from the parent's device as audio, A voice acquisition unit that acquires the child's voice, converts it into text information, and transmits it to the parent's terminal, A timeout monitoring unit that transmits ambient audio data and location information to the parent terminal if the aforementioned text information is not acquired within a predetermined time, A child monitoring and information provision system equipped with the necessary features.

2. A child monitoring information provision system according to claim 1, The aforementioned voice output unit and voice acquisition unit are part of a child monitoring information provision system that includes a voice processing unit operating within the monitored terminal.

3. A child monitoring information provision system according to claim 1, A child monitoring information provision system in which the predetermined time is 10 seconds or more and 20 seconds or less.

4. A child monitoring information provision system according to claim 1, A child monitoring information provision system that transmits ambient sound data and location information when it detects a specific sound.

5. A child monitoring information provision system according to claim 1, The aforementioned ambient audio data is an audio recording between 5 and 15 seconds long, used in the child monitoring information provision system.

6. A child monitoring information provision system according to claim 1, The aforementioned monitored terminal is a child monitoring information provision system that manages the voice output unit and the voice acquisition unit using a finite state machine with 5 or fewer states.

7. A child monitoring information provision system according to claim 1, A child monitoring information provision system that encrypts the aforementioned ambient audio data using a predetermined encryption method and automatically deletes it after a predetermined storage period has elapsed.

8. A child monitoring information provision system according to claim 1, A child monitoring information provision system that extends the positioning interval after transmitting the aforementioned ambient sound data and location information to reduce power consumption.

9. A child monitoring information provision system according to claim 1, A child monitoring information provision system that transmits volume classification information, which divides the average sound pressure level into multiple stages based on the aforementioned ambient sound data.

10. A child monitoring information provision system according to claim 1, A child monitoring information provision system that classifies the ambient sound data into one of several categories and transmits the classification result along with the location information.

11. A child monitoring information provision system according to claim 10, A child monitoring information provision system that transmits the classification results as text information to the parent terminal and outputs the results as audio on the parent terminal.

12. A child monitoring information provision system according to claim 1, A child monitoring information provision system in which the timeout monitoring unit transmits the surrounding audio data and location information, along with images acquired by the monitored terminal.

13. Computers, The system outputs messages received from the parent's device as audio. The system captures the child's voice, converts it into text, and sends it to the parent's device. If the aforementioned text information is not acquired within the specified time, ambient audio data and location information will be sent to the parent's device. A program that makes things work properly.

14. The process involves receiving a message from the parent's device and outputting it as audio, The process involves acquiring the child's voice and sending it as text information to the parent's device, If the aforementioned text information is not acquired within a predetermined time, the process involves sending ambient audio data and location information to the parent's device. Information provision methods including those mentioned above.

15. Computers, Send a message to the monitored device, The system outputs text information, ambient sound data, location information, volume classification information, sound type classification results, situation estimation results, or image data received from the monitored device. A program that makes things work.

16. The process of sending a message to the monitored device, The process involves outputting text information, ambient sound data, location information, volume classification information, sound type classification results, situation estimation results, or image data received from the monitored terminal. Information processing methods including

17. An audio output unit that outputs messages received from the parent's device as audio, A voice acquisition unit that acquires the child's voice, converts it into text information, and transmits it to the parent's terminal, A timeout monitoring unit that transmits ambient audio data and location information to the parent terminal if the aforementioned text information is not acquired within a predetermined time, A monitoring terminal equipped with the following features.

18. The monitored device and the guardian's device are connected in a way that enables communication. The monitored terminal receives response information in which the child's voice has been converted into text information, or ambient sound data and location information transmitted when the monitored terminal does not receive the response information within a predetermined time. The response information, or the ambient sound data and location information, is transmitted to the guardian terminal. server.

19. The system comprises a monitored terminal, a guardian terminal, and a wireless communication unit for communication between the two terminals. The aforementioned monitoring terminal is A notification unit that notifies the child of messages received from the parent's device, A response reception desk that receives responses from children, If, after notification by the notification unit, no response is received by the response receiving unit within a predetermined time, the monitoring unit transmits information to the parent terminal to understand the child's situation. A child monitoring and information provision system equipped with the necessary features.