An intelligent drowning detection system, method and related devices
The intelligent drowning detection system, which combines acoustic sensing modules and deep learning models, solves the problems of detection lag, poor accuracy, and high false alarm rate in existing technologies. It achieves early, accurate, and reliable drowning detection, improving user experience and rescue efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- THE HONG KONG UNIV OF SCI & TECH
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-19
AI Technical Summary
Existing drowning detection technologies suffer from drawbacks such as detection lag, poor accuracy, high false alarm rate, limited application scenarios, poor user experience, and privacy infringement, making it difficult to achieve early, accurate, reliable, and user-friendly drowning detection.
The system employs an acoustic sensing module to continuously detect audio data, combines it with a deep learning model for drowning analysis, and uses a false positive handling module for secondary confirmation. Finally, an alarm and rescue triggering module triggers alarms and rescue operations, thus constructing a highly reliable drowning detection process.
It enables early, accurate, and reliable drowning detection, reduces false alarm rates, ensures end-to-end reliability from signal acquisition to final action, and improves user experience and rescue efficiency.
Smart Images

Figure CN122245025A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of acoustic detection technology, and in particular to an intelligent drowning detection system, method and related equipment. Background Technology
[0002] Drowning is one of the leading causes of accidental death worldwide, posing a serious threat, especially to children and adolescents. Data from the World Health Organization shows that drowning is one of the top ten causes of death for children and young adults aged 1-24.
[0003] However, existing drowning detection and prevention technologies mainly fall into the following categories, each with significant limitations: Traditional manual and physical protective methods include lifeguard monitoring and life jackets. Lifeguard monitoring is limited by manpower, field of vision, and attention, making it difficult to monitor every swimmer in real time and without blind spots in crowded or limited waters. While traditional life jackets provide buoyancy, their bulky and stuffy wearing experience often leads users, especially young people and water sports enthusiasts, to be unwilling to wear them, thus reducing their actual protective effect.
[0004] Wearable devices based on physiological parameter monitoring: Some existing technologies attempt to determine drowning by monitoring physiological indicators such as heart rate and blood oxygen saturation through devices like wristbands. However, these physiological indicators typically only show significant fluctuations a considerable time after drowning (e.g., due to drastic changes in heart rate caused by hypoxia), representing late-stage characteristics that are insufficient to buy precious time for rescue. Furthermore, the large baseline differences in physiological parameters between individuals result in poor system generalization ability, easily leading to false alarms or missed alarms.
[0005] Visual recognition-based monitoring systems: Analyzing swimmer behavior using cameras and computer vision algorithms is another common approach. However, such systems, like those based on YOLO and other object detection algorithms, have several inherent problems: First, in crowded places like swimming pools, mutual occlusion (visual occlusion) can cause algorithm failure or misjudgments; second, complex environmental factors such as changing lighting, water reflections, and water wave interference can lead to extremely high false alarm rates; and finally, the widespread deployment of cameras raises serious concerns about personal privacy.
[0006] Mechanical devices with stringent triggering conditions, such as depth-triggered airbags, require the user to sink to a specific depth and remain submerged for a period of time to activate. This "better late than never" triggering mechanism is severely delayed, only intervening in the late stages of drowning and missing the optimal early rescue opportunity.
[0007] In summary, existing technologies generally suffer from drawbacks such as detection lag, poor accuracy, high false alarm rate, limited application scenarios, poor user experience, or privacy infringement. Summary of the Invention
[0008] The main objective of this invention is to provide an intelligent drowning detection system, method, wearable device, electronic device, storage medium, and program product, aiming to solve at least one problem of the prior art.
[0009] To achieve the above objectives, one aspect of the present invention provides an intelligent drowning detection system, the system comprising: The acoustic sensing module is used to continuously detect audio data in the target aquatic environment, and then extract the target audio through signal processing technology; the target audio includes audio segments of a continuous time window; The artificial intelligence processing module is used to perform drowning analysis on audio segments using a pre-integrated deep learning model to obtain the drowning situation for each time window. The misjudgment handling module is used to make drowning judgments based on drowning situations within a continuous time window and obtain drowning prediction results; among which, the drowning prediction results include confirmed drowning and no drowning. The alarm and rescue trigger module is used to trigger alarm and rescue operations in response to the drowning prediction result confirming drowning.
[0010] In some embodiments, when the acoustic sensing module continuously detects audio data in the target aquatic environment and then extracts the target audio through signal processing technology, it specifically performs the following operations: The acoustic delay between each microphone is quantified by applying an adaptive beamforming algorithm to the microphone array, thereby dynamically generating a sharp beam pointing towards the sound source region of the target object. It uses a sharp beam to spatially suppress noise from other directions in the environment and collects target audio from the sound source area of the target object.
[0011] In some embodiments, the microphone array includes a compact array of four omnidirectional microphones, three of which form a triangular layout and a fourth omnidirectional microphone located slightly below the center of gravity of the triangular layout.
[0012] In some embodiments, when the deep learning model employs a contrastive language-audio pre-trained model, the artificial intelligence processing module performs the following operations when using the pre-integrated deep learning model to perform drowning analysis on audio segments and obtain the drowning situation for each time window: Convert the audio segment into a time-frequency representation; wherein the time-frequency representation includes a log-Mel spectrogram; The time-frequency representation is encoded using a contrastive language-audio pre-trained model to generate audio embeddings; The audio embedding is compared with the predefined text embedding, and the drowning situation in the corresponding time window is determined based on the comparison result; Among them, the contrastive language-audio pre-training model is trained on the target dataset, which is obtained by merging environmental noise audio with several basic datasets of drowning-related positive audio and performing data expansion and overlap processing; the predefined text embedding includes text embedding of at least one sound type of drowning event.
[0013] In some embodiments, when the deep learning model employs a multilayer perceptron, before the artificial intelligence processing module performs drowning analysis on the audio segment using a pre-integrated deep learning model to obtain the drowning situation for each time window, it further performs the following operations: The performance of drowning event detection is taken as a hard requirement, and the reduction of the model's computational and storage overhead is taken as a constraint. Based on hard and constraint conditions, the multilayer perceptron is optimized through model quantization, and then the optimized multilayer perceptron is deployed and integrated.
[0014] In some embodiments, drowning situations include drowning types and non-drowning types. When the misjudgment processing module makes a drowning judgment based on the drowning situation in a continuous time window and obtains a drowning prediction result, it specifically performs the following operations: The initial number of consecutive counts is set to 0, and the drowning prediction result is initialized as no drowning. Use the current time window's drowning situation as the basis for judgment; The type of drowning should be determined based on the circumstances. If the situation is determined to be drowning, the number of consecutive counts is incremented. If the situation is determined to be non-drowning, set the consecutive count to 0; The drowning situation in the next time window is used as the judgment condition. The process of judging the drowning type is repeated until the number of consecutive times reaches the preset threshold. Then the drowning prediction result is switched to confirmed drowning.
[0015] In some embodiments, when the drowning prediction result confirms drowning, the misjudgment handling module is also configured to perform the following operations: The initial moment is taken as the time when the drowning was confirmed, based on the drowning prediction result. A misjudgment time window is generated based on the initial time and a preset duration; If a deactivation command is received from the target object within the misjudgment time window, the drowning prediction result will be switched to no drowning. The misjudgment handling module is equipped with a manual operation touch point, and the deactivation command is triggered based on the target object's click operation on the manual operation touch point.
[0016] In some embodiments, the alarm and rescue triggering module is equipped with an alarm device and an automatic airbag inflation system. When triggering alarm and rescue operations, the alarm and rescue triggering module performs the following operations: The floating alarm is released by the alarm device to sound a buzzer and perform a location operation, and the alarm device also sends alarm communication to the nearby area; the communication data of the alarm communication includes the location result of the location operation. The automatic airbag inflation system is activated to pull a drowning target to the surface.
[0017] To achieve the above objectives, another aspect of this invention proposes an intelligent drowning detection method, applied to the aforementioned system, the method comprising: The acoustic sensing module continuously detects audio data in the target aquatic environment, and then extracts the target audio through signal processing technology; the target audio includes audio segments within a continuous time window. The artificial intelligence processing module uses a pre-integrated deep learning model to perform drowning analysis on the audio segments, obtaining the drowning situation for each time window; The misjudgment processing module determines drowning based on the drowning situation within a continuous time window, and obtains a drowning prediction result; the drowning prediction result includes confirmed drowning and no drowning. The alarm and rescue trigger module responds to the drowning prediction result and confirms drowning, triggering alarm operation and activating drowning prevention devices.
[0018] To achieve the above objectives, another aspect of the present invention proposes a wearable device that integrates the aforementioned intelligent drowning detection system. The wearable device includes a housing for accommodating the intelligent drowning detection system, a microphone array connected to an acoustic sensing module, a microcontroller connected to an artificial intelligence processing module and an alarm and rescue triggering module, a buzzer for issuing an alarm, and manual operation contacts.
[0019] To achieve the above objectives, another aspect of the present invention provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the aforementioned method.
[0020] To achieve the above objectives, another aspect of the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned method.
[0021] To achieve the above objectives, another aspect of the present invention provides a computer program product, including a computer program that, when executed by a processor, implements the aforementioned method.
[0022] The embodiments of this invention include at least the following beneficial effects: This invention provides an intelligent drowning detection system, method, wearable device, electronic device, storage medium, and program product. This solution continuously detects audio data in a target aquatic environment through an acoustic sensing module, and then extracts the target audio through signal processing technology. The target audio includes audio segments within continuous time windows. An artificial intelligence processing module uses a pre-integrated deep learning model to perform drowning analysis on the audio segments, obtaining the drowning situation for each time window. A misjudgment processing module makes a drowning judgment based on the drowning situation within the continuous time windows, obtaining a drowning prediction result. The drowning prediction result includes confirmed drowning and no drowning. An alarm and rescue triggering module responds to a confirmed drowning prediction result by triggering an alarm operation and activating the drowning prevention device. The embodiments of this invention construct a highly reliable drowning detection process through a modular architecture of "acoustic sensing - AI processing - misjudgment processing - alarm and rescue." First, acoustic sensing focuses on the unique audio signals in the early stages of drowning, ensuring timely detection. Second, intelligent analysis of the audio using an AI deep learning model significantly improves the accuracy of identification. Furthermore, by introducing an independent false alarm handling module to reconfirm the initial results of the AI, the number of false alarms can be effectively reduced. Ultimately, the alarm and rescue are only triggered after drowning is confirmed, effectively achieving end-to-end reliability from signal acquisition to final action, and solving the problem of insufficient overall performance caused by the single function and simple judgment logic of existing technical solutions. Attached Figure Description
[0023] Figure 1 This is a schematic diagram of an example structure of the intelligent drowning detection system provided in an embodiment of the present invention; Figure 2 This is a schematic diagram illustrating an implementation scenario of intelligent drowning detection provided in an embodiment of the present invention; Figure 3 This is a flowchart illustrating the intelligent drowning detection method provided in an embodiment of the present invention; Figure 4 This is a schematic diagram of the specific process of the intelligent drowning detection method provided in the embodiments of the present invention; Figure 5 This is a schematic diagram of the front side of the circuit board integrated into the intelligent drowning detection system provided in an embodiment of the present invention; Figure 6 This is a schematic diagram of the back of the integrated circuit board of the intelligent drowning detection system provided in this embodiment of the invention; Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0024] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In the following description, when referring to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the embodiments of this invention; they are merely examples of apparatuses and methods consistent with some aspects of the embodiments of this invention as detailed in the appended claims.
[0025] It is understood that the terms “first,” “second,” etc., used in this invention may be used herein to describe various concepts, but unless specifically stated otherwise, these concepts are not limited by these terms. These terms are used only to distinguish one concept from another. For example, first information may also be referred to as second information without departing from the scope of embodiments of the invention, and similarly, second information may also be referred to as first information. Depending on the context, the words “if,” “when,” or “in response to determination” as used herein may be interpreted as “when…” or “when…” or “in response to determination.”
[0026] The terms “at least one,” “multiple,” “each,” “any,” etc., used in this invention, “at least one” includes one, two, or more than two; “multiple” includes two or more than two; “each” refers to each of the corresponding multiple; and “any” refers to any one of the multiple.
[0027] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to limit the invention.
[0028] Existing technologies generally suffer from drawbacks such as detection lag, poor accuracy, high false alarm rates, limited application scenarios, poor user experience, or privacy violations. There is an urgent need for a personal drowning detection solution that can achieve early, accurate, reliable, and user-friendly operation.
[0029] In view of this, this invention provides an intelligent drowning detection system, method, and related equipment. This solution continuously detects audio data in the target aquatic environment using an acoustic sensing module, and then extracts the target audio using signal processing technology. The target audio includes audio segments within continuous time windows. An artificial intelligence processing module uses a pre-integrated deep learning model to perform drowning analysis on the audio segments, obtaining the drowning situation for each time window. A misjudgment handling module makes a drowning judgment based on the drowning situation within the continuous time windows, obtaining a drowning prediction result. The drowning prediction result includes confirmed drowning and no drowning. An alarm and rescue triggering module responds to a confirmed drowning prediction result by triggering an alarm and activating the drowning prevention device. This invention constructs a highly reliable drowning detection process through a modular architecture of "acoustic sensing - AI processing - misjudgment handling - alarm and rescue." First, acoustic sensing focuses on the unique audio signals in the early stages of drowning, ensuring timely detection. Second, intelligent analysis of the audio using an AI deep learning model significantly improves the accuracy of identification. Furthermore, by introducing an independent false alarm handling module to reconfirm the initial results of the AI, the number of false alarms can be effectively reduced. Ultimately, the alarm and rescue are only triggered after drowning is confirmed, effectively achieving end-to-end reliability from signal acquisition to final action, and solving the problem of insufficient overall performance caused by the single function and simple judgment logic of existing technical solutions.
[0030] Reference Figure 1 , Figure 1 This is an optional structural diagram of the intelligent drowning detection system provided in this embodiment of the invention. The intelligent drowning detection system may include, but is not limited to: The acoustic sensing module 100 is used to continuously detect audio data in the target aquatic environment, and then extract the target audio through signal processing technology; wherein, the target audio includes audio segments of a continuous time window; Artificial intelligence processing module 200 is used to perform drowning analysis on audio segments using a pre-integrated deep learning model to obtain the drowning situation for each time window; The misjudgment processing module 300 is used to make a drowning judgment based on the drowning situation in a continuous time window and obtain a drowning prediction result; wherein, the drowning prediction result includes confirmed drowning and no drowning; The alarm and rescue trigger module 400 is used to trigger alarm and rescue operations in response to the drowning prediction result confirming drowning.
[0031] For example, in some specific embodiments, taking the system integration onto a swimming cap as an example, the intelligent drowning detection system of this invention can be implemented as follows: Acoustic sensing modules, such as waterproof microelectromechanical microphones integrated into the swimming cap, continuously collect the wearer's underwater audio data. The built-in analog front-end and digital signal processor can initially filter out some noise, extracting the audio stream containing possible cries for help or sounds of choking on water. This audio stream is then segmented according to preset time windows (e.g., every 2 seconds).
[0032] An artificial intelligence processing module, such as a quantized multilayer perceptron model deployed on a low-power microcontroller inside the swimming cap, receives the Mel-frequency cepstral coefficient features of each audio window and outputs the "probability of drowning risk" corresponding to that window in real time. When the probability exceeds a preset threshold (e.g., 0.8), the window is marked as "suspected drowning".
[0033] The false alarm handling module, which is a piece of logic code running on the microcontroller, continuously tracks the output of the artificial intelligence processing module. When it records multiple (e.g., 3, i.e., 6 consecutive seconds) time windows marked as "suspected drowning", it will switch the internal state of the system from "monitoring" to "pre-confirmed drowning".
[0034] The alarm and rescue trigger module can send an alarm signal to the lifeguard receiver on the shore via wireless communication protocols such as Bluetooth or LoRa when the system confirms drowning. It can also simultaneously trigger the buzzer and LED light on the swimming cap for an audible and visual alarm.
[0035] Specifically, this invention addresses the problems of existing technologies, such as detection lag (e.g., physiological parameter monitoring), poor accuracy and high false alarm rates (e.g., visual recognition), and insufficient reliability due to reliance on a single judgment step. This invention constructs a highly reliable drowning detection process through a modular architecture of "acoustic sensing - AI processing - false alarm handling - alarm and rescue." First, acoustic sensing focuses on the unique audio signals in the early stages of drowning, ensuring timely detection. Second, intelligent analysis of the audio using an AI deep learning model significantly improves recognition accuracy. Furthermore, the introduction of an independent false alarm handling module to reconfirm the initial AI results effectively reduces single false alarms. Finally, alarm and rescue are triggered only after drowning is confirmed, effectively achieving end-to-end reliability from signal acquisition to final action, and solving the problem of insufficient overall performance caused by the single function and simple judgment logic of existing technologies.
[0036] It should be noted that when the acoustic sensing module continuously detects audio data in the target aquatic environment and extracts the target audio through signal processing technology, it specifically performs the following operations: quantifies the sound wave delay between each microphone by applying an adaptive beamforming algorithm to the microphone array, and then dynamically generates a sharp beam pointing towards the sound source area of the target object; uses the sharp beam to spatially suppress noise in other directions of the environment and collects the target audio from the sound source area of the target object.
[0037] For example, in some specific implementations, it is assumed that the user's smart swimming goggles integrate a small array of four microphones. When the system is started, the digital signal processor of the acoustic sensing module can run a generalized sidelobe canceller algorithm. This algorithm first estimates the time difference of the sound received by each microphone to calculate the approximate direction of the user's mouth. Then, the algorithm dynamically adjusts the weighting coefficients of the signals from each microphone to form a "main lobe" beam pointing in that direction to maximize the acquisition of sound from the user's mouth. At the same time, the algorithm can adaptively create "zeros" or reduce the gain of signals from other directions (such as the back or sides of the pool) to effectively suppress the sound of water flow or other swimmers in the environment, and finally extract clear target audio.
[0038] Specifically, this invention addresses the problem that existing audio monitoring systems are susceptible to environmental noise interference and struggle to extract effective target audio from complex backgrounds. By employing an adaptive beamforming algorithm, this invention dynamically forms an "auditory probe" pointing towards the user (target object), effectively suppressing environmental noise from other directions (such as the voices of bystanders or the sound of waves). This feature ensures that even in noisy environments like swimming pools or beaches, the system can capture the user's faint voice signal with high fidelity, providing high-quality "raw materials" for subsequent accurate AI analysis. This fundamentally improves the system's anti-interference capability and robustness in real-world, complex environments.
[0039] It should be noted that the microphone array consists of a compact array of four omnidirectional microphones, with three omnidirectional microphones forming a triangular layout and the fourth omnidirectional microphone located slightly below the center of gravity of the triangular layout.
[0040] For example, in some specific implementations, a flexible circuit board is embedded at the top front of the system-integrated swimming cap, on which four MEMS microphones are integrated. Three of the microphones are located at the three vertices of an equilateral triangle with a side length of approximately 2.5 cm, while the fourth microphone is positioned at the center of the triangle, slightly offset about 0.5 cm towards the user's forehead (i.e., towards the mouth). This arrangement allows the acoustic center of the entire array to be closer to the user's sound source, thereby achieving better near-field sound pickup.
[0041] Specifically, this invention addresses the challenges of deploying general-purpose microphone arrays in small wearable devices, including poor directivity and difficulty in accurately focusing on the user's mouth as a specific sound source. By configuring four omnidirectional microphones in a specific geometric structure of a "triangular layout with a lower center of gravity," this array structure can specifically optimize the acoustic response to the user's mouth area. This optimized physical layout, combined with beamforming algorithms, enables the system to more accurately focus on the wearer's own sound source, further improving the clarity and signal-to-noise ratio of the target audio, making it particularly suitable for wearable scenarios with strict requirements on size and shape, such as swimming goggles and swim caps.
[0042] It should be noted that when the deep learning model uses a contrastive language-audio pre-trained model, the artificial intelligence processing module performs the following operations when using the pre-integrated deep learning model to analyze audio segments for drowning and obtain the drowning situation for each time window: converting the audio segments into time-frequency representations; wherein the time-frequency representations include log-Mel spectrograms; encoding the time-frequency representations using the contrastive language-audio pre-trained model to generate audio embeddings; comparing the audio embeddings with predefined text embeddings, and determining the drowning situation for the corresponding time window based on the comparison results; wherein the contrastive language-audio pre-trained model is trained on a target dataset, and the target dataset is obtained by fusing environmental noise audio with several basic datasets of drowning-related positive audio and performing data expansion and overlap processing; the predefined text embeddings include text embeddings of at least one sound type of drowning event.
[0043] For example, in some specific implementations, assume the system captures a 2-second audio window. First, the audio processing unit converts the audio segment into a 64-dimensional log-Mel spectrogram. Then, this spectrogram is input into the audio encoder of a pre-trained CLAP (Contrastive Language-Audio Pretraining) model to generate a 512-dimensional normalized audio embedding vector. Simultaneously, the system's storage unit pre-stores various text embedding vectors generated by the CLAP text encoder, such as corresponding to the text descriptions "the sound of choking while drowning" and "the sound of shouting for help." Based on this, the AI module can calculate the cosine similarity between the audio embedding vector and these two pre-stored text embedding vectors. If either similarity score exceeds a preset threshold (e.g., 0.85), the corresponding time window can be determined to be of the "drowning type." Furthermore, the CLAP model in this embodiment can be pre-trained and fine-tuned on a diverse dataset containing thousands of labeled audio samples mixed from real drowning simulations and various environmental noise audio.
[0044] Specifically, traditional audio recognition models (such as simple keyword recognition) have weak generalization capabilities and struggle to handle the significant individual differences and complex variations in the manifestations of sounds like "choking on water" and "calling for help." This invention employs a multimodal pre-trained model like CLAP. By mapping audio signals to the same semantic space as text descriptions (such as "drowning and choking sounds") for comparison, the model learns high-level semantic features of the sounds, rather than fixed acoustic templates. This allows the system to more accurately identify the inherent patterns of various "drowning-related" sounds across users of different ages, genders, and voices, thereby greatly improving the model's generalization ability and recognition accuracy. Furthermore, data augmentation techniques can be used to train the model, making it more adaptable to various noises in real-world environments.
[0045] It should be noted that when the deep learning model uses a multilayer perceptron, before the artificial intelligence processing module performs drowning analysis on the audio segment using the pre-integrated deep learning model to obtain the drowning situation for each time window, it also performs the following operations: taking the detection performance of drowning events as a hard condition and reducing the computational and storage overhead of the model as a constraint; based on the hard condition and the constraint, optimizing the multilayer perceptron through model quantization, and then deploying and integrating the optimized multilayer perceptron.
[0046] For example, in some specific implementations, during the model development phase, an audio dataset containing a large number of positive and negative samples can be collected first. Specifically, an MLP model with three hidden layers (128, 64, and 32 neurons respectively) can be trained using recall and precision as hard criteria. Furthermore, to deploy it on wearable devices with limited computing power, the trained 32-bit floating-point MLP model can be further quantized using TensorFlow Lite, converting all model weights and activation functions to 8-bit integers. Quantization significantly compresses the model size and reduces single inference time and power consumption, enabling efficient integration into the microcontroller of wearable devices while maintaining performance metrics such as accuracy.
[0047] Specifically, this invention addresses the problem that complex deep learning models (such as CNNs) are computationally intensive and resource-intensive, making them difficult to run in real-time on edge wearable devices with limited computing power and battery capacity, such as swimming goggles and swimming caps. By selecting a lightweight MLP model and optimizing it using model quantization techniques, this invention significantly reduces the model's computational complexity, memory usage, and power consumption while ensuring core detection performance meets requirements. This allows for the effective integration of AI drowning detection capabilities into miniature, battery-powered personal wearable devices, achieving an engineered balance between high performance and low power consumption, thus ensuring product practicality and battery life.
[0048] It should be noted that drowning situations include drowning type and non-drowning type. When the misjudgment handling module makes a drowning judgment based on the drowning situation in a continuous time window and obtains a drowning prediction result, it performs the following operations: initialize the consecutive count to 0 and initialize the drowning prediction result to no drowning; take the drowning situation in the current time window as the judgment situation; determine the drowning type of the judgment situation; if the judgment situation is drowning type, increment the consecutive count; if the judgment situation is non-drowning type, set the consecutive count to 0; take the drowning situation in the next time window as the judgment situation, return to perform the drowning type judgment operation on the judgment situation, until the consecutive count reaches the preset count threshold, and switch the drowning prediction result to confirmed drowning.
[0049] Exemplary examples include some specific implementations: During system initialization, the variable count is initialized to 0, and the final state confirm is FALSE.
[0050] In the first 2-second window, the AI judges the result as "drowning type". The misjudgment module executes: count++, at which point count=1.
[0051] Processing the second 2-second window, the AI still judges the result as "drowning type", and the module continues to execute: count++, at this time count=2.
[0052] If the AI still determines the third 2-second window as "drowning type", then the module executes: count++, at which point count=3.
[0053] If the count reaches the preset threshold of 3, the module immediately sets the system status to TRUE, indicating "confirmed drowning". If the AI in any window determines it as "non-drowning type" during this period, the count will be immediately reset to 0.
[0054] Specifically, this invention addresses the issue of high randomness and false alarm rates in single detection results, which can lead to frequent false triggers, causing "alarm fatigue" or user distrust. By introducing a time-series judgment logic based on "continuously detecting drowning signals," the system is no longer "instantaneous," but requires AI to make continuous drowning judgments within a short period. This cumulative triggering mechanism effectively filters isolated false alarms caused by transient noise, accidental coughing (not drowning), etc., thereby greatly improving the confidence and stability of the judgment results and ensuring the seriousness and reliability of alarm triggering.
[0055] It should be noted that when the drowning prediction result is confirmed drowning, the misjudgment handling module is also used to perform the following operations: take the trigger time of the drowning prediction result being confirmed drowning as the initial time; generate a misjudgment time window based on the initial time and a preset duration; within the misjudgment time window, if a deactivation command is received from the target object, switch the drowning prediction result to not drowning; wherein, the misjudgment handling module is configured with a manual operation touch point, and the deactivation command is triggered based on the target object's click operation on the manual operation touch point.
[0056] For example, in some implementations, when the system triggers the "confirmed drowning" state, it starts a 5-second software timer. During these 5 seconds, the system continuously listens for interrupt signals from user-manually operated touchpoints. These touchpoints can be designed as metal pieces within a groove on the side of the smart bracelet, supporting double-click detection. If the user quickly double-clicks the touchpoint within 5 seconds, the bracelet's touch IC decodes a "cancel" command. Upon receiving this command, the system immediately rolls back the "confirmed drowning" state to "not drowning" and terminates any upcoming or already initiated alarm triggering processes, resuming normal monitoring. If no double-click operation is detected within 5 seconds, the timer expires, and the system ultimately locks the "confirmed drowning" state.
[0057] Specifically, even the most accurate AI judgments cannot be 100% error-free. There's a risk of misclassifying normal but unusual user behavior (such as laughing and choking on water, or deliberately calling for help and playing around) as drowning, leading to unnecessary panic and rescue delays. This invention addresses this by providing a user-interventionible "false alarm cancellation window," returning the final decision-making power to the user. For example, when a false alarm occurs, a conscious user can immediately cancel the alarm with a simple double-click. This design cleverly strikes a balance between the convenience of automated detection and the accuracy of manual verification, preserving the system's automatic triggering capability while providing a simple and effective error correction mechanism. This effectively solves the problem of false alarms and greatly improves the user experience.
[0058] It should be noted that the alarm and rescue trigger module is equipped with an alarm device and an automatic airbag inflation system. When the alarm and rescue operation is triggered, the alarm and rescue trigger module performs the following operations: releases the floating alarm device to sound a buzzer and perform a positioning operation, and sends alarm communication to the nearby area through the alarm device; the communication data of the alarm communication includes the positioning result of the positioning operation; and activates the automatic inflation system to automatically inflate the airbag to pull the drowning target to the surface.
[0059] For example, in some specific implementations, once the system finally confirms "confirmed drowning," the alarm module first sends an "SOS" text message containing GPS location information and an app push notification to pre-bound emergency contacts and a receiver in the pool lifeguard's duty room via its built-in 4G Cat.1 communication module. Simultaneously, the module drives a miniature motor to release a flexible airbag compressed within a wearable device. For instance, this airbag is rapidly inflated within one second from a miniature carbon dioxide cylinder to generate buoyancy, thereby lifting the drowning user to the surface. As the airbag inflates and floats to the surface, the wearable device can also integrate waterproof LED lights that flash at high frequency and a buzzer that emits a sharp blare, providing significant audio-visual location guidance for rescuers.
[0060] Specifically, this invention addresses the problem that triggering only audible and visual alarms results in low rescue efficiency and success rates when the user is unconscious, lifeguards fail to respond promptly, or visibility is poor at night. This invention integrates "audible and visual alarm positioning" and "automatic airbag inflation" to create a dual-insurance rescue mechanism. On one hand, the audible and visual alarms and location information transmission quickly guide rescuers to the scene; on the other hand, the automatic airbag inflates automatically upon confirmation of drowning, actively pulling the user to the surface and preventing them from sinking due to drowning, thus buying valuable time and opportunity for subsequent rescue. This invention, through a combined active and passive rescue solution, significantly improves the success rate of drowning rescue, thereby maximizing the protection of the user's life.
[0061] This invention also provides an intelligent drowning detection method applied to the aforementioned system. It is understood that the content of the method embodiments of this invention is applicable to the aforementioned system embodiments, and the specific functions implemented by the method embodiments of this invention are the same as those achieved by the system embodiments of this invention, and the beneficial effects achieved are also the same as those achieved by the aforementioned system embodiments.
[0062] It is understood that the intelligent drowning detection method provided by this invention can also be applied to any computer device with data processing and computing capabilities, and this computer device can be various terminals or servers. When the computer device in the embodiment is a server, the server is an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. Optionally, the terminal can be a smartphone, tablet computer, laptop computer, or desktop computer, but it is not limited to these.
[0063] like Figure 2 The diagram shown is a schematic representation of an implementation environment provided by an embodiment of the present invention. (Refer to...) Figure 2 The implementation environment includes at least one terminal 102 and a server 101. The terminal 102 and the server 101 can be connected via a network, either wirelessly or via a wired connection, to complete data transmission and exchange.
[0064] Server 101 can be a standalone physical server, a server cluster or distributed system consisting of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
[0065] Additionally, server 101 can also be a node server in a blockchain network. Blockchain is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms.
[0066] Terminal 102 can be a smartphone, tablet computer, laptop computer, desktop computer, smart speaker, smartwatch, etc., but is not limited to these. Terminal 102 and server 101 can be directly or indirectly connected via wired or wireless communication, and this embodiment of the invention does not impose any limitations.
[0067] For example, based on Figure 2 The implementation environment shown in this embodiment of the invention provides an intelligent drowning detection method. The following description uses the application of this intelligent drowning detection method in server 101 as an example. It can be understood that this intelligent drowning detection method can also be applied in terminal 102.
[0068] Reference Figure 3 , Figure 3 This is an optional flowchart of the intelligent drowning detection method provided in the embodiments of the present invention. The executing entity of the intelligent drowning detection method can be any of the aforementioned computer devices (including servers or terminals). Figure 3 The method may include, but is not limited to, steps S100 to S400.
[0069] Step S100: The acoustic sensing module continuously detects the audio data in the target aquatic environment, and then extracts the target audio through signal processing technology; wherein, the target audio includes audio segments of a continuous time window; Step S200: The artificial intelligence processing module uses a pre-integrated deep learning model to perform drowning analysis on the audio segment to obtain the drowning situation for each time window. Step S300: The misjudgment processing module performs a drowning judgment based on the drowning situation in a continuous time window to obtain a drowning prediction result; wherein, the drowning prediction result includes confirmed drowning and no drowning. In step S400, the alarm and rescue trigger module responds to the drowning prediction result as a confirmed drowning, triggers the alarm operation and activates the drowning prevention device.
[0070] It should be noted that in some embodiments, when the drowning prediction result is confirmed drowning, the method may further include the following steps: taking the trigger time of the drowning prediction result being confirmed drowning as the initial time; generating a misjudgment time window based on the initial time and a preset duration; within the misjudgment time window, if a deactivation command is received from the target object, switching the drowning prediction result to not drowning; wherein, the misjudgment processing module is configured with a manual operation touch point, and the deactivation command is triggered based on the target object's click operation on the manual operation touch point.
[0071] This invention also provides a wearable device that integrates the aforementioned intelligent drowning detection system. The wearable device includes a housing for housing the intelligent drowning detection system, a microphone array connected to an acoustic sensing module, a microcontroller connected to an artificial intelligence processing module and an alarm and rescue triggering module, a buzzer for issuing an alarm, and manual operation contacts.
[0072] To explain in detail the principle of the technical solution of the present invention, the overall process of the present invention will be described below with reference to some specific embodiments. It is easy to understand that the following is an explanation of the technical principle of the present invention and should not be regarded as a limitation of the present invention.
[0073] To address the shortcomings of existing technologies, this invention provides an AI-driven acoustic sensing system for real-time drowning detection. This system innovatively combines acoustic sensing technology with artificial intelligence processing, which can significantly improve the accuracy of real-time detection in aquatic environments.
[0074] At least one embodiment will now be discussed in conjunction with the accompanying drawings, which are not drawn to scale. When technical features, detailed descriptions, or any claims in the drawings are followed by reference numerals, these reference numerals are added solely to enhance the comprehensibility of the drawings, detailed descriptions, and claims. Therefore, the presence or absence of reference numerals is not intended to limit the scope of any claim element. In the various drawings, identical or nearly identical components shown in each figure are represented by the same numbers. For clarity, not every component is labeled in every figure. The drawings are for illustrative and explanatory purposes and are not intended to limit the scope of this disclosure.
[0075] like Figure 4 The diagram illustrates the possible processes of sound detection, drowning assessment, false positive handling, and signal confirmation in a drowning detection system. Specifically, the anti-drowning smart wearable system of this embodiment uses a two-second sliding window to capture the user's acoustic information, enabling real-time detection of the initial stages of drowning. Specifically, the system acquires an audio window every two seconds in a continuous and non-overlapping manner. This design ensures the independence of detection, avoids duplicate counting of the same event due to window overlap, and the cumulative triggering mechanism of three consecutive windows effectively improves the robustness of the judgment. The acoustic data is processed by the contrastive language-audio pre-trained model of this embodiment. This model evaluates the input signal and only issues a drowning assessment when three consecutive drowning signals are detected. This timing relationship ensures the consistency and confidence of the detection. This signal is determined by a single acoustic mode, independent of other sensor data, thus simplifying the system architecture and reducing power consumption. If the signal does not reach this threshold, the system continues to analyze the acoustic input without making a judgment. Once the CLAP mechanism confirms drowning, the user is initially identified as being in danger. However, in the event of a false alarm—due to overly sensitive system sensitivity or user actions causing confusion—the user can manually cancel the drowning status within five seconds by interacting with a designated touch point on the wearable device. This touch point is designed as a raised button on the upper side of the device casing, allowing for quick location by touch in emergencies. The anti-accidental touch mechanism uses a double-click confirmation method; the user can cancel the drowning status by pressing the button twice consecutively within five seconds, effectively preventing accidental cancellation due to a single accidental touch. After cancellation, the system immediately resumes real-time CLAP analysis without a silent period to ensure that subsequent alarms are not missed due to delays in actual drowning situations. If the status is canceled, the system will resume normal monitoring; conversely, if no cancellation operation is performed, drowning is confirmed, triggering an alarm, potentially notifying local rescue forces and activating an emergency alarm to call for nearby first aid. The printed circuit board of this invention can achieve an extremely small size, allowing seamless integration into various water sports wearable devices, such as swimming goggles, swimming caps, and swimsuits. In scenarios where multiple people swim simultaneously, to avoid cross-interference between users' voices, the system uses a special microphone matrix arrangement to identify the main user's sound source. This matrix employs a specific array of multiple miniature microphones, combined with beamforming technology to focus on the area near the wearer's mouth. Simultaneously, it utilizes near-field acoustic characteristics to suppress distant interfering sound sources, thereby effectively improving the signal-to-noise ratio. This design eliminates the need for voiceprint recognition or device ID binding, accurately capturing the user's own drowning-related acoustic signals in noisy environments, ensuring reliable detection.
[0076] like Figure 5The diagram shown is a front view of a circuit board integrated into a compact electronic device according to an embodiment of the present invention (the circuit layout can be set according to actual needs. The circuit layout of this embodiment is only shown as an example and does not limit the technical solution of the present invention. The circuit board example only focuses on the layout of key components). Figure 5 Key components highlighted in the diagram include a buzzer 1, located at the top of the board, which serves as an auditory alarm mechanism to ensure alarms or notifications are heard in various environments. Adjacent to the buzzer on the right is microphone 1 (in some optional implementations, more microphones can be integrated to form a microphone array), specifically designed to capture ambient sound. This strategic layout allows it to effectively pick up a wide range of audio inputs, making it ideal for applications requiring environmental awareness. At the center of the board is a microcontroller unit (MCU) 3, which acts as the brain of the device, processing data from the microphones and controlling the buzzer. The central position of the MCU facilitates efficient connections with other components, optimizing overall performance and functionality.
[0077] like Figure 6 As shown in the schematic diagram of the back of the circuit board, microphone 4, for example, can be strategically placed at the bend on the back. This microphone is designed to capture the user's audio input, ensuring clear recording of the user's voice. The careful placement of microphone 4 enhances user interaction, making it an integral part of the device's functionality.
[0078] Figure 5 and Figure 6 Together, they illustrate an example layout of key components on the circuit board, indicating the location and function of the buzzer, two microphones (one for ambient sound and the other for user input), and the microcontroller unit. These components are fundamental to the device's operation, and the illustrations effectively convey the complex layout and connections of the internal elements. This clear depiction not only aids in understanding the design but also provides valuable reference for future development and troubleshooting.
[0079] In some specific embodiments, the intelligent drowning detection system of the present invention can be implemented as follows: Acoustic Monitoring: This invention utilizes advanced acoustic sensors, particularly a microphone array, to detect distress calls and coughing sounds in aquatic environments. By employing a microphone array, the system effectively suppresses environmental noise, thereby more accurately extracting the user's voice. In the small wearable device of this invention, a compact array consisting of four omnidirectional microphones is integrated onto the device surface. Specifically, three microphones are located at the center of the upper edge of a rectangle, the lower left corner, and the lower right corner, forming a non-equilateral triangle layout with a base width of approximately 3 cm. The fourth microphone is located slightly below the centroid of this triangle, closer to the user's mouth. This array uses an adaptive beamforming algorithm to calculate the sound wave delay between each microphone in real time, dynamically generating a sharp beam pointing towards the user's mouth area, while simultaneously spatially suppressing noise from other directions in the environment (such as water waves, wind, and background human voices). Combined with subsequent deep learning-based audio event detection, the system can effectively capture and identify distress calls and coughing sounds in low signal-to-noise ratio environments, providing accurate wearable acoustic monitoring for aquatic safety. This improvement in clarity is achieved through sophisticated signal processing techniques, such as beamforming, which focus on specific sound sources while minimizing interference from ambient noise. These sensors are carefully designed to capture subtle audio cues that may indicate a person is in distress, thus providing early warning of potential drowning situations.
[0080] Improving Accuracy Through Deep Learning: This system significantly improves the accuracy of drowning detection by integrating state-of-the-art deep learning models, particularly convolutional neural networks and recurrent neural networks. These models analyze real-time data from acoustic sensors, enabling the system to effectively distinguish between normal swimming behavior and potential drowning scenarios. Unlike traditional methods, the introduction of deep learning allows the system to learn from various scenarios, continuously enhancing its detection capabilities over time. This adaptability ensures the system remains effective in diverse conditions and environments.
[0081] In one embodiment, a method for real-time detection of drowning events is provided. The method may include using an audio classifier configured to identify choking or drowning-related sounds. For illustrative purposes, a contrastive language-audio pre-trained model is used as a proof-of-concept to evaluate its ability to identify choking or drowning sounds. Specifically, the system applies a two-second sliding window with a step size of one second to both choking and non-choking audio segments. Each audio segment is converted to a time-frequency representation, such as a log-Mel spectrogram, and then encoded by a pre-trained CLAP model to generate multimodal, semantically aligned, and normalized audio embeddings. The method then involves comparing these embeddings with one or more predefined text embeddings associated with sound type to determine whether a given audio segment corresponds to a drowning event.
[0082] The CLAP model used in this solution integrates personal microphone arrays and beamforming technology on the basis of existing technologies to achieve efficient suppression of environmental noise. Relying on the core feature of text-audio semantic alignment, it significantly improves the generalization ability of drowning-related acoustic feature recognition. At the same time, it is combined with a two-second sliding window and a threshold judgment mechanism of three consecutive detections to achieve real-time and accurate monitoring in the early stage of drowning. In terms of model training, 400 drowning-related positive audio samples are used as the base dataset. Data expansion and overlap processing are carried out by integrating crowd feature audio and various water environment noise audio samples to expand the training samples to thousands, effectively improving the model's scene adaptability and robustness. To address the problem of limited hardware resources on edge devices, the computationally complex CLAP model is abandoned and a lightweight MLP model is selected for device deployment. Model quantization is used as the core optimization method to significantly reduce the computation and storage overhead of the model while ensuring detection performance, adapting to the deployment requirements of wearable water equipment such as swimming goggles and swimming caps.
[0083] As shown in Table 1 below, empirical tests on a validation dataset containing 400 audio clips demonstrate that CLAP achieves an overall prediction accuracy of approximately 82%.
[0084] Table 1
[0085] It should be noted that resource-constrained edge devices may not be able to support the computational complexity required by the CLAP model.
[0086] Because audio data is unstructured, applying traditional machine learning models such as linear regression or classification trees is challenging. Among deep learning models, multilayer perceptrons (MLPs) are best suited for edge deployment compared to CLAP, CNNs, and RNNs because they require fewer parameters, have lower computational complexity, achieve faster inference speeds, and consume less power. Therefore, a lightweight MLP-based classifier can be developed for on-device deployment to achieve efficient real-time drowning detection while minimizing computational requirements. Furthermore, the size of the MLP model can be further reduced through quantization, optimizing storage and computational efficiency while maintaining performance.
[0087] A two-layer false alarm reduction system with acoustic classification and manual cancellation: Employing a two-layer structure to minimize false alarms while ensuring timely detection with a low false alarm rate. The first layer consists of an acoustic classifier model used to determine whether the audio is related to drowning. The second layer provides a manual cancellation function, allowing the wearer to cancel the alarm upon the occurrence of a false alarm.
[0088] Wider Application Coverage: The design of this invention facilitates simultaneous monitoring of multiple areas, making it suitable for various environments, including public swimming pools, water parks, beaches, and private pools. This broad applicability overcomes the limitations of current systems that may only operate effectively in specific environments or require extensive human supervision. The system's inherent flexibility allows for seamless integration into different aquatic environments, providing comprehensive safety without continuous human monitoring. This not only enhances safety but also improves the user experience by minimizing disruption while ensuring the safety of the aquatic environment.
[0089] In summary, this innovative acoustic monitoring and drowning detection system combines advanced sensor technology with deep learning capabilities, creating an effective, responsive, and adaptable solution for improving the safety of aquatic environments.
[0090] In summary, the core principles of the embodiments of the present invention include: a) The system employs acoustic sensing capabilities.
[0091] b) It integrates advanced AI processing through deep learning models that analyze data from multiple sensors to accurately detect drowning scenarios.
[0092] c) The AI-driven real-time drowning detection acoustic sensing system of this invention is designed to be compatible with and enhance the functionality of various rescue systems, thereby improving the effectiveness of drowning prevention.
[0093] In some alternative implementations, a variety of downstream lifesaving equipment options are available: an automatic airbag inflation system with manual cancellation that detects distress and automatically inflates to keep the user's head above water, while allowing cancellation when there is no real danger; and an alarm device with manual cancellation that triggers an alarm to notify lifeguards and those nearby when distress is detected, and releases a floating alarm ball for location, which can be manually stopped if necessary. The manual cancellation function of both systems addresses potential false alarms, enhances user control, reduces unnecessary interference and alarm fatigue, and provides customized solutions for different age groups to ensure enhanced safety and user experience in various aquatic environments.
[0094] Compared with existing technologies, the present invention achieves at least the following beneficial effects: advanced AI processing enables high-precision drowning detection; rapid response in the early stages of drowning; a dual-layer structure prevents false triggering; wider coverage of application scenarios; and a personal wearable design improves monitoring accuracy.
[0095] This invention also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described above. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.
[0096] It is understood that the content of the above method embodiments is applicable to this device embodiment. The specific functions implemented by this device embodiment are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0097] like Figure 7 As shown, Figure 7 The hardware structure of an electronic device 1000 according to another embodiment is illustrated. The electronic device 1000 includes: The processor 1001 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (aSIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of the present invention. The memory 1002 can be implemented as a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RaM). The memory 1002 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 1002 and is called and executed by the processor 1001. Input / output interface 1003 is used to implement information input and output; The communication interface 1004 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 1005 transmits information between various components of the device (e.g., processor 1001, memory 1002, input / output interface 1003, and communication interface 1004); The processor 1001, memory 1002, input / output interface 1003 and communication interface 1004 are connected to each other within the device via bus 1005.
[0098] The electronic device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0099] This invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method.
[0100] It is understood that the content of the above method embodiments is applicable to this storage medium embodiment. The specific functions implemented in this storage medium embodiment are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments.
[0101] This invention also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.
[0102] It is understood that the content of the above method embodiments is applicable to the embodiments of this program product. The specific functions implemented by the embodiments of this program product are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0103] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0104] The intelligent drowning detection system, method, wearable device, electronic device, storage medium, and program product provided in this invention continuously detects audio data in the target aquatic environment through an acoustic sensing module, and then extracts the target audio through signal processing technology. The target audio includes audio segments within continuous time windows. An artificial intelligence processing module uses a pre-integrated deep learning model to perform drowning analysis on the audio segments, obtaining the drowning situation for each time window. A misjudgment handling module makes a drowning judgment based on the drowning situation within the continuous time windows, obtaining a drowning prediction result. The drowning prediction result includes confirmed drowning and no drowning. An alarm and rescue triggering module responds to a confirmed drowning prediction result by triggering an alarm operation and activating the drowning prevention device. This invention constructs a highly reliable drowning detection process through a modular architecture of "acoustic sensing - AI processing - misjudgment handling - alarm and rescue." First, the acoustic sensing focuses on the unique audio signals in the early stages of drowning, ensuring timely detection. Second, the intelligent analysis of the audio using an AI deep learning model significantly improves the accuracy of identification. Furthermore, by introducing an independent false alarm handling module to reconfirm the initial results of the AI, the number of false alarms can be effectively reduced. Ultimately, the alarm and rescue are only triggered after drowning is confirmed, effectively achieving end-to-end reliability from signal acquisition to final action, and solving the problem of insufficient overall performance caused by the single function and simple judgment logic of existing technical solutions.
[0105] The embodiments described in this invention are for the purpose of more clearly illustrating the technical solutions of the embodiments of this invention, and do not constitute a limitation on the technical solutions provided by the embodiments of this invention. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this invention are also applicable to similar technical problems.
[0106] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of the present invention, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0107] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0108] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0109] The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present invention. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and spirit of the present invention should be within the scope of the claims of the present invention.
Claims
1. An intelligent drowning detection system, characterized in that, The system includes: An acoustic sensing module is used to continuously detect audio data in the target aquatic environment, and then extract the target audio through signal processing technology; wherein, the target audio includes audio segments of a continuous time window; An artificial intelligence processing module is used to perform drowning analysis on the audio segment using a pre-integrated deep learning model to obtain the drowning situation for each time window; The misjudgment handling module is used to make a drowning judgment based on the drowning situation in the continuous time window and obtain a drowning prediction result; wherein, the drowning prediction result includes confirmed drowning and no drowning; The alarm and rescue triggering module is used to trigger alarm and rescue operations in response to the drowning prediction result being confirmed as drowning.
2. The system according to claim 1, characterized in that, When the acoustic sensing module continuously detects audio data in the target aquatic environment and extracts the target audio through signal processing technology, it specifically performs the following operations: The acoustic delay between each microphone is quantified by applying an adaptive beamforming algorithm to the microphone array, thereby dynamically generating a sharp beam pointing towards the sound source region of the target object. The sharp beam is used to spatially suppress noise from other directions in the environment and to collect the target audio from the sound source region of the target object.
3. The system according to claim 2, characterized in that, The microphone array includes a compact array of four omnidirectional microphones, three of which form a triangular layout, and a fourth omnidirectional microphone located slightly below the center of gravity of the triangular layout.
4. The system according to claim 1, characterized in that, When the deep learning model employs a contrastive language-audio pre-trained model, and the artificial intelligence processing module performs drowning analysis on the audio segment using the pre-integrated deep learning model to obtain the drowning situation for each time window, the following specific operations are performed: The audio segment is converted into a time-frequency representation; wherein the time-frequency representation includes a log-Mel spectrum. The time-frequency representation is encoded using the contrastive language-audio pre-trained model to generate an audio embedding; The audio embedding is compared with the predefined text embedding, and the drowning situation corresponding to the time window is determined based on the comparison result; The contrastive language-audio pre-training model is trained on a target dataset, which is obtained by fusing environmental noise audio with several basic datasets of drowning-related positive audio to perform data expansion and overlap processing; the predefined text embedding includes text embedding of at least one sound type of drowning event.
5. The system according to claim 1, characterized in that, When the deep learning model employs a multilayer perceptron, before the artificial intelligence processing module performs drowning analysis on the audio segment using the pre-integrated deep learning model to obtain the drowning situation for each time window, it further performs the following operations: The performance of drowning event detection is taken as a hard requirement, and the reduction of the model's computational and storage overhead is taken as a constraint. Based on the aforementioned hard conditions and constraints, the multilayer perceptron is optimized through model quantization, and then the optimized multilayer perceptron is deployed and integrated.
6. The system according to claim 1, characterized in that, The drowning situation includes drowning type and non-drowning type. When the misjudgment processing module makes a drowning judgment based on the drowning situation in the continuous time window and obtains the drowning prediction result, it specifically performs the following operations: The number of consecutive counts is initialized to 0, and the drowning prediction result is initialized to the non-drowning result. The drowning situation in the current time window is used as the judgment condition; The drowning type is determined based on the aforementioned situation; If the determined situation is the type of drowning, the number of consecutive counts is incremented. If the determined situation is the non-drowning type, the number of consecutive counts is set to 0; The drowning situation in the next time window is taken as the judgment situation, and the operation of judging the drowning type of the judgment situation is returned to be executed until the number of consecutive times reaches the preset number threshold, and the drowning prediction result is switched to the confirmed drowning.
7. The system according to claim 1, characterized in that, When the drowning prediction result is confirmed as drowning, the misjudgment handling module is also used to perform the following operations: The drowning prediction result is taken as the trigger time of the confirmed drowning as the initial time; A misjudgment time window is generated based on the initial time and a preset duration; If a deactivation command is received from the target object within the misjudgment time window, the drowning prediction result will be switched to the non-drowning result. The misjudgment handling module is configured with a manual operation touch point, and the deactivation command is triggered based on the target object's click operation on the manual operation touch point.
8. The system according to claim 1, characterized in that, The alarm and rescue triggering module is equipped with an alarm device and an automatic airbag inflation system. When triggering alarm and rescue operations, the alarm and rescue triggering module performs the following operations: The alarm device releases a floating alarm to sound a buzzer and perform a location operation, and sends alarm communication to nearby areas; wherein the communication data of the alarm communication includes the location result of the location operation; The automatic inflation system is activated to pull the drowning target to the surface.
9. A smart drowning detection method, characterized in that, The method, applied to the intelligent drowning detection system according to any one of claims 1 to 7, comprises the following steps: The acoustic sensing module continuously detects audio data in the target aquatic environment, and then extracts the target audio through signal processing technology; wherein, the target audio includes audio segments of a continuous time window; The audio segment is analyzed for drowning using a pre-integrated deep learning model through an artificial intelligence processing module to obtain the drowning situation for each time window. The misjudgment processing module performs a drowning judgment based on the drowning situation within the continuous time window to obtain a drowning prediction result; wherein, the drowning prediction result includes confirmed drowning and no drowning; In response to the drowning prediction result, the alarm and rescue trigger module activates the drowning prevention device by triggering an alarm operation and confirming drowning.
10. A wearable device, characterized in that, The wearable device integrates the intelligent drowning detection system according to any one of claims 1 to 7, comprising a housing for accommodating the intelligent drowning detection system, a microphone array connected to the acoustic sensing module, a microcontroller connected to the artificial intelligence processing module and the alarm and rescue triggering module, a buzzer for issuing an alarm, and manual operation contacts.