Inhalation medication compliance monitoring method and system based on deep neural network
By integrating sensors and deep neural networks into the inhaler, fine-grained monitoring and real-time feedback of inhaler operation are achieved, solving the problems of coarse granularity of action recognition and lag in evaluation in existing technologies, and improving the accuracy and real-time performance of medication adherence.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- THE FOURTH AFFILIATED HOSPITAL OF GUANGZHOU MEDICAL UNIV (GUANGZHOU ZENGCHENG DISTRICT PEOPLES HOSPITAL)
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-12
AI Technical Summary
Existing inhaler compliance monitoring technologies have coarse-grained action recognition, lack deep learning applications, have limited assessment dimensions, and provide delayed feedback guidance, making it impossible to provide real-time operational correction guidance.
The drug administration process data is collected by integrating a triaxial accelerometer and a microcomputer-controlled electro-current sensor. Feature extraction and temporal modeling are performed using a one-dimensional convolutional neural network and a bidirectional long short-term memory network. Action recognition and quality assessment are performed by combining a multi-task deep neural network, and real-time feedback guidance is transmitted via Bluetooth.
It enables refined assessment of inhaler operation, improves the accuracy of action recognition, provides real-time corrective guidance, and enhances the accuracy and real-time nature of medication adherence monitoring.
Smart Images

Figure CN122201607A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical and health monitoring technology, and in particular to a method and system for monitoring inhaled medication adherence based on deep neural networks. Background Technology
[0002] Inhalation therapy is a core treatment for respiratory diseases such as asthma and chronic obstructive pulmonary disease. Inhalers deliver medication directly to the respiratory tract to exert a local therapeutic effect, offering significant advantages over oral administration, including faster onset of action, smaller dosage, and fewer systemic side effects. However, clinical studies have shown that inhaler medication adherence is generally poor, with patient compliance rates typically below 50%, which has become a key bottleneck restricting the clinical efficacy of inhalation therapy.
[0003] Poor inhaler medication adherence is mainly due to two factors. Firstly, there's the issue of frequency adherence, meaning patients fail to use the inhaler at the prescribed times and frequencies. Secondly, there's the issue of technique adherence, meaning that although patients use the inhaler, their operating technique is incorrect, resulting in ineffective drug delivery to the lower respiratory tract. A standard inhaler operating procedure typically includes several steps: shaking to mix the medication, loading the medication, exhaling to empty the lungs of residual air, inhaling the medication with a deep breath, and holding the breath. Missing or incorrectly performing any of these steps can significantly reduce medication delivery efficiency.
[0004] Chinese patent CN118507077A discloses a method and system for rational use of medications based on diagnostic and drug item reasoning. This technical solution addresses the rational use of oral medications by employing natural language processing algorithms to extract basic patient information, performing feature extraction through convolutional neural networks and recurrent neural networks, generating personalized medication plans using Bayesian reasoning algorithms, and optimizing medication decisions using decision tree algorithms and genetic algorithms. However, this technical solution primarily focuses on the selection of oral medications and dosage reasoning, without addressing the monitoring and evaluation of inhaler medication adherence.
[0005] Existing inhaler compliance monitoring technologies mainly include electronic monitoring devices and smart inhalers. Electronic monitoring devices calculate medication frequency compliance by recording the timestamps of inhaler activation, but they cannot assess the technical standardization of medication administration. Some smart inhalers are equipped with simple sensors to detect inhalation events, but they can only provide binary compliance assessment results and lack the ability to finely evaluate the quality of standard operating procedures such as shaking, medication administration, exhalation, inhalation, and breath-holding.
[0006] Existing inhaler compliance monitoring technologies suffer from the following technical shortcomings. First, the granularity of action recognition is coarse; current solutions primarily rely on timestamp recordings or simple event detection, failing to distinguish and identify individual action units within the inhaler operation. Second, the application of deep learning technology is lacking; existing solutions often employ rule matching or simple threshold judgments, making it difficult to accurately identify complex and varied user behaviors. Third, the assessment dimensions are limited; current solutions only focus on whether medication has been taken, lacking quantitative assessments of key technical parameters such as peak inspiratory flow rate and breath-hold duration. Finally, feedback and guidance are delayed; existing solutions typically provide assessment results only after medication administration, failing to provide real-time guidance for corrective actions. Summary of the Invention
[0007] To address the shortcomings of existing technologies, this invention provides a method for monitoring inhaled medication adherence based on deep neural networks, comprising the following steps:
[0008] Step S1: Sensor data acquisition. Using a triaxial accelerometer and a microcomputer-controlled electro-current sensor integrated on the inhaler, motion posture data and inspiratory flow curves during medication administration are acquired in real time. The acquired sensor data is preprocessed by sliding window framing to generate standardized multi-channel time-series input data.
[0009] Step S2: One-dimensional convolutional feature extraction. The standardized multi-channel temporal input data obtained in step S1 is input into a one-dimensional convolutional neural network. Local features of the sensor temporal signal are extracted through multiple cascaded one-dimensional convolutional layers to identify the signal pattern features of five action units: shaking, medication application, exhalation, inhalation, and breath-holding, and to generate a temporal feature vector.
[0010] Step S3: Bidirectional temporal modeling. The temporal feature vector obtained in step S2 is input into a bidirectional long short-term memory network to perform bidirectional temporal modeling on the complete medication action sequence. The forward network learns the forward evolution pattern of the action sequence, and the backward network learns the reverse dependency relationship of the action sequence. The forward hidden state and the backward hidden state are concatenated and fused to learn the temporal pattern differences between correct and incorrect operations and generate temporal pattern features.
[0011] Step S4: Multi-task joint prediction. The temporal pattern features obtained in step S3 are input into a multi-task deep neural network. The computational overhead is reduced by sharing the underlying feature extractor. At the same time, three objectives are predicted: action type classification, inspiratory peak flow regression, and breath-hold duration estimation. Based on the output results of the three prediction objectives, the normative score of each action unit and the overall medication quality rating are calculated.
[0012] Step S5: Medication guidance feedback. The standardization score and overall medication quality rating obtained in step S4 are transmitted to the mobile application via Bluetooth. The application generates real-time voice correction guidance and a visual medication report based on the standardization score. The data is synchronized to the doctor's cloud platform to realize remote and precise supervision and management of medication adherence.
[0013] The present invention also provides an inhaled medication adherence monitoring system based on deep neural networks for implementing the above method, including: a sensor data acquisition module, a one-dimensional convolutional feature extraction module, a bidirectional temporal modeling module, a multi-task prediction module, and a medication guidance feedback module.
[0014] The beneficial effects of this invention are as follows: By integrating a triaxial accelerometer and a microcomputer-controlled electro-current sensor into the inhaler, this invention achieves comprehensive perception of the medication process; by employing a one-dimensional convolutional neural network to extract features from the sensor's temporal signals, it can accurately identify five standard action units: shaking, medication administration, exhalation, inhalation, and breath-holding; by constructing a bidirectional long short-term memory network to perform temporal modeling of the complete medication action sequence, it effectively learns the temporal pattern differences between correct and incorrect operations; by designing a multi-task deep neural network architecture to simultaneously predict three objectives: action type classification, inspiratory peak flow regression, and breath-holding duration estimation, it shares the underlying feature extractor to reduce computational overhead; and by transmitting the recognition results to a mobile application via Bluetooth, it enables real-time voice correction guidance and visualized medication reports, with data synchronized to a cloud platform to achieve a closed-loop remote and precise supervision and management system. Attached Figure Description
[0015] Figure 1 This is a flowchart of the inhaled medication compliance monitoring method based on deep neural networks provided in an embodiment of the present invention.
[0016] Figure 2 This is an architecture diagram of the inhaled medication compliance monitoring system based on deep neural networks provided in an embodiment of the present invention. Detailed Implementation
[0017] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present invention can be combined with each other.
[0018] See Figure 1 This invention provides a method for monitoring inhaled medication adherence based on deep neural networks. This method achieves accurate monitoring and intelligent assessment of inhaler medication adherence through multi-sensor fusion sensing, deep neural network feature extraction, and multi-task joint learning. In one embodiment of this invention, the overall method flow includes five core steps: sensor data acquisition, one-dimensional convolutional feature extraction, bidirectional temporal modeling, multi-task joint prediction, and medication guidance feedback. These steps form a deeply coupled data processing closed loop.
[0019] Step S1: Sensor Data Acquisition. In one embodiment of the present invention, the sensor data acquisition step involves real-time acquisition of multimodal signal data during medication administration using multiple types of sensors integrated on the inhaler. Specifically, the present invention employs a dual-sensor fusion scheme using a triaxial accelerometer and a microprocessor-based electro-current sensor, respectively used to capture the user's motion posture information and respiratory airflow information when operating the inhaler.
[0020] A triaxial accelerometer is used to collect acceleration changes of the inhaler in three-dimensional space. Preferably, the present invention uses a microelectromechanical system (MEMS) accelerometer, which is small in size and easy to integrate into the inhaler housing. In one embodiment of the present invention, the sampling frequency of the triaxial accelerometer is configured to 100Hz. This sampling frequency can fully capture the details of the user's movements when operating the inhaler, while avoiding data redundancy and increased power consumption caused by excessively high sampling rates. The range of the triaxial accelerometer is configured to ±16g. This range can cover the large acceleration values generated when the user shakes the inhaler, avoiding signal saturation and truncation. The triaxial accelerometer outputs acceleration component signals in three channels, denoted as follows: , and ,in The sampling time is indicated by the three components, which correspond to the X-axis, Y-axis, and Z-axis directions of the inhaler's body coordinate system, respectively.
[0021] A microprocessor-based electro-current sensor is used to collect airflow signals generated when a user breathes through an inhaler mouthpiece. In one embodiment of the invention, the microprocessor-based electro-current sensor employs a differential pressure measurement principle, calculating the instantaneous airflow rate by detecting changes in the pressure difference at the mouthpiece. Preferably, the sampling frequency of the microprocessor-based electro-current sensor is also configured to 100Hz, maintaining synchronous sampling with the accelerometer sensor to facilitate subsequent multimodal data fusion processing. The measurement range of the microprocessor-based electro-current sensor is configured to 0 to 300 L / min, which covers the peak inspiratory flow rate range of a normal adult, ensuring that no valid inspiratory signals are missed. The microprocessor-based electro-current sensor outputs a single-channel instantaneous airflow signal, denoted as... Positive values indicate airflow in the direction of inhalation, while negative values indicate airflow in the direction of exhalation.
[0022] In one embodiment of the present invention, after the original sensor data acquisition is completed, the data needs to be preprocessed using sliding window framing to generate standardized input data suitable for deep neural network processing. The core idea of sliding window framing is to divide continuous time-series signals into data frames of fixed length, with each data frame serving as an independent input sample fed into the neural network for processing. Preferably, the present invention configures the sliding window length... With 256 sampling points and a sampling frequency of 100Hz, the time span corresponds to 2.56 seconds. This time span can cover the typical duration of short-duration action units such as shaking, medication administration, and exhalation. For long-duration actions such as breath-holding, which last longer than the duration of a single window, since the window sliding step size is 64 sampling points (0.64s), there is a 75% overlap rate between adjacent windows. A breath-holding action lasting 5 seconds will be covered by at least 8 consecutive overlapping windows. Subsequently, the complete long-duration action features are restored through cross-window temporal fusion modeling of the bidirectional long short-term memory network in step S3. Window sliding step size The configuration is set to 64 sampling points, corresponding to a time interval of 0.64s. This step size is set to ensure sufficient time resolution while avoiding computational redundancy caused by excessive overlap between adjacent frames.
[0023] After being divided into frames using a sliding window, each frame of data contains four channels: the X, Y, and Z components of the three-axis acceleration and the airflow signal. Each channel contains 256 sampling points. Therefore, the tensor shape of a single frame of input data is... 4 represents the number of channels (number of input channels) ), 256 represents the time step length. In one embodiment of the present invention, the data of each channel also needs to be standardized with zero mean and unit variance to eliminate the differences in dimensions and numerical ranges between different sensors, thereby improving the training stability and convergence speed of the neural network.
[0024] Step S2: One-dimensional convolutional feature extraction. In one embodiment of the present invention, the one-dimensional convolutional feature extraction step inputs the standardized multi-channel temporal input data obtained in step S1 into a one-dimensional convolutional neural network, and performs local feature extraction on the sensor temporal signal through multiple cascaded one-dimensional convolutional layers to identify the signal pattern features of five action units: shaking, medication application, exhalation, inhalation, and breath-holding.
[0025] One-dimensional convolutional neural networks (CNNs) are deep learning models specifically designed for one-dimensional temporal signals, with one-dimensional convolution operations as their core operation. Unlike traditional two-dimensional convolution operations that process image data, one-dimensional convolution operations only slide the convolution kernel along the time axis to extract local temporal features. The mathematical expression for a one-dimensional convolution operation is:
[0026] ,
[0027] in: To output the feature map of the th Each channel at time eigenvalues; For the input feature map, the first Each channel at time eigenvalues; For the connection input number Channels and Outputs The convolution kernel of the channel Weight parameters for each position; To output the first Channel offset parameters; The initial input in this invention is the number of input channels. ; The kernel size is measured in terms of the number of sampling points; subscript This represents the position index inside the convolution kernel, with a value range of 1. to .
[0028] Preferably, in one embodiment of the present invention, a three-layer cascaded one-dimensional convolutional structure is designed to achieve layered feature extraction from shallow to deep. The kernel size of the first one-dimensional convolutional layer is configured as follows: The input channels are 4, and the output channels are 64. A smaller kernel size is suitable for capturing fine-grained local temporal variations, such as short pulses in acceleration signals and rapid fluctuations in airflow signals. The kernel size of the second one-dimensional convolutional layer is configured as follows: The input channels are 64, and the output channels are 128. Medium-sized convolutional kernels can capture mesoscale temporal patterns spanning longer time periods, such as the periodic features of shaking motions. The kernel size of the third one-dimensional convolutional layer is configured as follows: The number of input channels is 128, and the number of output channels is... Larger convolutional kernel sizes are suitable for capturing macroscopic temporal structures with longer time spans, such as the complete waveform envelope of an inhalation action from start to peak and then to end.
[0029] In one embodiment of the present invention, each one-dimensional convolutional layer is sequentially followed by a batch normalization layer, a ReLU activation function layer, and a max pooling layer. The batch normalization layer effectively alleviates the internal covariate shift problem in deep network training by performing zero-mean, unit-variance normalization on each feature channel, accelerating network convergence and improving generalization ability. The ReLU activation function introduces non-linear transformation capability, and its mathematical expression is: This method can suppress negative noise signals while preserving positive activation signals. The max pooling layer achieves feature downsampling by taking the maximum value within a local time window. This invention configures the pooling window size to be 2 and the pooling step size to be 2. After each pooling layer, the time dimension is halved, which effectively reduces the subsequent computation and enhances the translation invariance of features.
[0030] Through the processing of the above three-layer cascaded one-dimensional convolutional structure, the original input data is transformed from... The tensor shape is gradually transformed into The temporal feature vector is defined as follows: 256 represents the feature channel dimension, and 32 represents the remaining time steps after three pooling operations. In one embodiment of the present invention, this temporal feature vector effectively encodes discriminative features related to action recognition in acceleration and airflow signals, laying a feature foundation for subsequent temporal modeling.
[0031] To address the signal pattern characteristics of the five action units, this invention utilizes a one-dimensional convolutional neural network to achieve differentiated feature representation. For the swaying motion, the one-dimensional convolutional layer can extract the periodic reciprocating motion features from the acceleration signal. This feature manifests as an alternating positive and negative change pattern of the three-axis acceleration components over a short period of time, typically represented by the swaying motion amplitude threshold. The dosage is 0.5g. For medication administration, the one-dimensional convolutional layer can extract attitude adjustment features from the acceleration signal, which are represented by a specific sequence of acceleration changes generated during the transition from a stationary state to an inhalation posture. For exhalation, the one-dimensional convolutional layer can extract negative airflow features from the airflow sensor signal, which are represented by the reverse airflow signal generated when the user exhales into the inhaler, typically ranging from -20 to -5 L / min. For inhalation, the one-dimensional convolutional layer can extract positive peak airflow features from the airflow sensor signal, which are represented by the rising edge and peak point of the flow rate curve during inhalation, and the minimum effective peak flow rate threshold. The flow rate is configured to be 30 L / min. For breath-holding maneuvers, a one-dimensional convolutional layer can extract steady-state near-zero flow characteristics from the airflow sensor signal and stationary attitude maintenance characteristics from the acceleration signal, with the shortest effective breath-holding duration threshold. Configured for 5 seconds.
[0032] Step S3: Bidirectional Temporal Modeling. In one embodiment of the present invention, the bidirectional temporal modeling step inputs the temporal feature vector obtained in step S2 into a bidirectional long short-term memory network to perform bidirectional temporal modeling on the complete medication action sequence, learning the temporal pattern differences between correct and incorrect operations.
[0033] Long Short-Term Memory (LSTM) networks are a special type of recurrent neural network that effectively solves the vanishing and exploding gradient problems faced by traditional recurrent neural networks in long sequence modeling by introducing a gating mechanism. The core structure of an LSTM unit consists of four components: a forget gate, an input gate, an output gate, and a cell state. The forget gate determines which historical information is discarded from the cell state; its calculation formula is as follows:
[0034] ,
[0035] in: Forget the gate at any time The output value, range of values. ; Use the Sigmoid activation function; Here is the weight matrix for the forget gate; This is the hidden state vector from the previous time step; This is the input feature vector at the current time. This is the bias vector for the forget gate; This indicates a vector concatenation operation.
[0036] The input gate determines which new information is stored in the cell state, and its calculation formula is as follows:
[0037] ,
[0038] ,
[0039] in: For the input gate at time The output value, range of values. ; These are candidate cell state values; and These are the weight matrices for the input gate and the candidate cell state, respectively; and This is the corresponding bias vector; The hyperbolic tangent activation function has an output range of .
[0040] Cell state is updated through the joint regulation of the forgetting gate and the input gate, and its calculation formula is as follows:
[0041] ,
[0042] in: For a moment The cell state vector; This is the cell state vector from the previous moment; This represents the element-wise multiplication operation (Hadamard product).
[0043] The output gate determines which information from the cell state will be output to the hidden state, and its calculation formula is as follows:
[0044] ,
[0045] ,
[0046] in: For the output gate at time The output value, range of values. ; For a moment The hidden state output vector; This is the weight matrix of the output gate; This is the bias vector for the output gate.
[0047] In one embodiment of this invention, the bidirectional long short-term memory network introduces a bidirectional processing mechanism based on the standard LSTM. It simultaneously constructs two parallel processing paths, a forward LSTM and a backward LSTM, to model temporal data from both forward and backward directions. The forward network processes the sequence data in chronological order, learning the forward evolution pattern of the action sequence and capturing the temporal dependencies from the shaking action to the inhalation and breath-holding actions. The backward network processes the sequence data in reverse chronological order, learning the inverse dependencies of the action sequence and using information from subsequent actions to help determine the type and quality of the current action.
[0048] The formula for calculating the hidden state of a forward LSTM is:
[0049] ,
[0050] in: For the forward network at time The hidden state vector; For a moment The input feature vector is the temporal feature output of the one-dimensional convolutional layer; and These represent the hidden state and cell state of the feedforward network at the previous time step, respectively.
[0051] The formula for calculating the hidden state of a backward LSTM is:
[0052] ,
[0053] in: For backward network at time The hidden state vector; and These represent the hidden state and cell state of the feedforward network at the next time step, respectively.
[0054] The final bidirectional fusion hidden state is obtained by concatenating the forward and backward hidden states:
[0055] ,
[0056] in: This is the hidden state vector after bidirectional fusion; This indicates a vector concatenation operation.
[0057] Preferably, in one embodiment of the present invention, the dimension of the forward hidden layer is configured to be 128, the dimension of the backward hidden layer is also 128, and the feature dimension after concatenating the forward hidden state and the backward hidden state is 256 (i.e., To enhance the depth and expressiveness of temporal modeling, this invention employs a two-layer stacked bidirectional LSTM structure, where the output of the first BiLSTM layer serves as the input to the second BiLSTM layer, extracting higher-level temporal abstract features layer by layer.
[0058] By employing bidirectional temporal modeling, this invention effectively learns the temporal pattern differences between correct and incorrect operations. Correct inhaler operation should follow the standard action sequence of "shaking → administering medication → exhalation → inhalation → breath-holding," with clear sequential order and temporal dependencies between each action unit. Incorrect operations may manifest in various forms, such as missing actions (e.g., no shaking or breath-holding), reversed action order (e.g., inhalation before exhalation), or substandard action quality (e.g., low peak inspiratory flow). By simultaneously considering historical and future contexts, bidirectional LSTM can more accurately determine the position and role of the current action within the overall operation sequence, thereby achieving precise evaluation of operational compliance. In one embodiment of this invention, bidirectional temporal modeling also plays a crucial role in cross-window temporal fusion. Since the sliding window length in step S1 is 2.56 seconds, for long-duration actions exceeding the duration of a single window (e.g., the shortest effective threshold for breath-holding is 5 seconds), a single window cannot completely cover all its temporal features. However, with a sliding window step size of 0.64s and a 75% overlap between adjacent windows, a 5-second breath-holding action will be covered by at least eight consecutive overlapping windows, each extracting local features from different time periods of the action. Bidirectional LSTM models the temporal feature vectors of multiple consecutive overlapping windows bidirectionally. The forward network gradually accumulates steady-state near-zero flow features from the start of the breath-hold, while the backward network traces back the static posture preservation features from the end of the breath-hold. The fusion of the hidden states from both directions reconstructs the complete temporal features of long-duration actions, effectively solving the problem that a single window duration is insufficient to cover long-duration actions.
[0059] Step S4: Multi-task joint prediction. In one embodiment of the present invention, the multi-task joint prediction step inputs the temporal pattern features obtained in step S3 into a multi-task deep neural network, reduces computational overhead by sharing the underlying feature extractor, and simultaneously predicts three objectives: action type classification, inspiratory peak flow regression, and breath-holding duration estimation.
[0060] Multi-task learning is a machine learning paradigm that improves learning performance by simultaneously learning multiple related tasks within a single model, leveraging shared information between tasks. In one embodiment of this invention, the multi-task deep neural network architecture consists of a shared feature extractor and multiple task-specific prediction heads. The shared feature extractor receives temporal pattern features from the output of a bidirectional LSTM and further extracts task-independent high-level abstract features through fully connected layers. Preferably, the shared feature extractor includes two cascaded fully connected layers: a first fully connected layer maps a 256-dimensional input to a 256-dimensional output, and a second fully connected layer maps the 256-dimensional input to a 128-dimensional output. Each fully connected layer is followed by a ReLU activation function and a Dropout regularization layer (with a Dropout ratio configured to 0.3) to prevent overfitting.
[0061] Based on the output of the shared feature extractor, this invention constructs three parallel task-specific prediction heads, which are used for three prediction objectives: action type classification, inspiratory peak flow regression, and breath-hold duration estimation, respectively.
[0062] The action type classification head determines the action type at the current moment, outputting a five-category result (shaking, applying medicine, exhaling, inhaling, holding breath). The classification head consists of a fully connected layer and a Softmax activation function. The fully connected layer maps 128 shared features to a 5-dimensional output, and the Softmax function converts the output into a probability distribution.
[0063] ,
[0064] in: To predict the first The probability of a class of actions; For the classification head fully connected layer corresponding to the first The original output value (logit) of the class; the denominator is the sum of the exponents of the logits of all classes, ensuring that the output probability satisfies the normalization constraint. .
[0065] The inspiratory peak flow regression head is used to predict the peak flow rate during inhalation, outputting a single continuous numerical value. The regression head consists of a fully connected layer and a ReLU activation function. The fully connected layer maps 128 shared features to a 1-dimensional output, and the ReLU function ensures that the predicted value is non-negative.
[0066] ,
[0067] in: The predicted peak inspiratory flow rate is expressed in L / min. This is the weight vector of the peak velocity regression head; The output vector of the shared feature extractor; This is the bias parameter.
[0068] The breath-hold duration estimation head predicts the duration of the breath-holding action, outputting a single continuous numerical value. The estimation head also consists of a fully connected layer and a ReLU activation function.
[0069] ,
[0070] in: The predicted breath-holding duration is expressed in seconds. The weight vector for the breath-holding duration estimation head; This is the bias parameter.
[0071] The training of multi-task deep neural networks employs a multi-task joint loss function, which weights and sums the classification loss, peak flow rate regression loss, and breath-hold duration regression loss. The classification loss uses the cross-entropy loss function:
[0072] ,
[0073] in: For the one-hot encoding of the real label, if the real category is the first... Class Otherwise, it is 0.
[0074] Both peak flow rate regression loss and breath-hold duration regression loss use the mean squared error loss function:
[0075] ,
[0076] ,
[0077] in: and These are the actual peak inspiratory flow rate and breath-holding duration labels, respectively.
[0078] The total loss function for multiple tasks is defined as the weighted sum of the losses of each task:
[0079] ,
[0080] in: , and These are task weighting coefficients used to balance the importance and convergence speed of different tasks; configured in one embodiment of the present invention. , and A larger classification weight ensures that the action recognition task is prioritized for optimization as the primary task, while a smaller regression weight prevents the regression task with excessively large numerical differences from dominating gradient updates.
[0081] Based on the output of the three prediction targets, this invention further calculates the standardization score of each action unit and the overall medication quality rating. Standardization Score The calculation is based on the following four evaluation dimensions:
[0082] Action integrity score : Evaluate whether all five action units have been executed. If all five action units are detected, then Points are deducted for each missing action unit, with a minimum of 0 points.
[0083] Action sequence correctness score Assess whether the execution sequence of the five action units conforms to the standard operating procedure (shaking → administering medication → exhaling → inhaling → holding breath). If the sequence is completely correct, then... Points; 5 points will be deducted for each incorrect sequence, with a minimum of 0 points.
[0084] Inspiratory peak flow rate achievement score Assess whether the peak inspiratory flow rate has reached the minimum effective threshold. L / min. If ,but points; if Then calculate proportionally. The above assessment assumes that the inhalation action is detected; when the inhalation action is not detected (i.e., the action is missing), the peak inhalation flow rate is directly assigned a full score of 25 points. The penalty for missing inhalation is already reflected in the action integrity score, avoiding the same erroneous behavior from being deducted points repeatedly in two assessment dimensions.
[0085] Breath-holding duration score Assess whether the breath-holding duration has reached the shortest effective threshold. s. If ,but points; if Then calculate proportionally. The above assessment assumes that the breath-holding action is detected; when the breath-holding action is not detected (i.e., the action is missing), the breath-holding duration standard score is directly assigned a full score of 25 points. The penalty for missing breath-holding is already reflected in the action integrity score, avoiding the same wrong behavior from being deducted points repeatedly in two assessment dimensions.
[0086] The formula for calculating the overall standardization score is as follows:
[0087] ,
[0088] in: The score ranges from 0 to 100, with higher scores indicating more standardized medication administration.
[0089] Overall medication quality rating Based on the normative scoring, it is divided into four levels:
[0090] ,
[0091] Grade A indicates excellent operation, Grade B indicates good operation, Grade C indicates qualified operation but needs improvement, and Grade D indicates unqualified operation and needs retraining.
[0092] Step S5: Medication guidance feedback.
[0093] In one embodiment of the present invention, the medication guidance feedback step transmits the standardization score and overall medication quality rating obtained in step S4 to a mobile application via Bluetooth, thereby enabling real-time guidance for users and remote supervision for doctors.
[0094] Preferably, the inhaler has a built-in low-power Bluetooth module that uses the BLE 5.0 protocol to connect to the user's mobile phone. After each medication administration, the inhaler automatically packages and transmits the recognition results data to the mobile application. The data package includes information such as the action recognition sequence, timestamps of each action unit, predicted peak inspiratory flow rate, estimated breath-hold duration, standardization score, and overall quality rating. The transmission latency is controlled within 200ms to ensure that the user receives near real-time feedback.
[0095] The mobile application generates real-time voice correction guidance based on received normative scores. When a missing action is detected, the voice prompts the user to complete the missing action, for example: "We detected that you did not perform the shaking action. Please shake the inhaler thoroughly to mix the medication before your next dose." When an incorrect sequence of actions is detected, the voice prompts the user to perform them in the correct order, for example: "We detected that you inhaled directly before exhaling. The correct sequence is to exhale first to empty your lungs and then inhale deeply." When the peak inspiratory flow rate is not reached, the voice prompts the user to increase the inhalation force, for example: "Your inhalation force is insufficient. Please inhale more forcefully and deeply at your next dose to ensure the medication fully enters your lungs." When insufficient breath-holding time is detected, the voice prompts the user to extend the breath-holding time, for example: "Your breath-holding time is only 3 seconds. We recommend holding your breath for at least 5 seconds to allow for sufficient medication deposition."
[0096] The mobile application simultaneously generates a visual medication report, presenting the medication quality assessment results graphically. The action sequence diagram displays the identification results and time distribution of each action unit during the medication administration process, visually showcasing the action sequence using a color-coded timeline. The standardization radar chart displays scores for four dimensions: action completeness, sequence correctness, peak flow rate compliance, and breath-hold duration compliance. These four dimensions correspond to the four axes of the radar chart; a larger score area indicates better overall standardization. The historical trend chart shows the recent trend of the user's medication quality rating, displaying the rating changes for the last 7 or 30 medication administrations in a line graph format, allowing users to easily observe improvements in their medication habits. The improvement suggestion list generates targeted improvement suggestions based on the deficiencies of this medication administration, listing specific, actionable improvement measures in an itemized format.
[0097] In one embodiment of this invention, medication data is synchronized to a cloud platform on the doctor's end to achieve remote and precise supervision and management. The cloud platform stores the patient's historical medication records and quality assessment results, and doctors can view the patient's medication adherence report via a web interface or app. The report includes multi-dimensional analysis results such as medication frequency statistics, medication time distribution, operational quality trends, and common error types. When a patient's medication quality rating is repeatedly rated as D, the cloud platform automatically sends an alert notification to the doctor, prompting the doctor to pay attention to the patient and consider arranging offline operational training. Doctors can use the platform to send personalized medication guidance videos or voice messages to patients, forming a closed-loop management mechanism of "patient medication → system assessment → doctor supervision → patient improvement".
[0098] The embodiments of the present invention achieved excellent technical results in testing. On a clinical validation dataset containing 200 asthma patients, the recognition accuracy of the five types of action units reached 94.2%, the mean absolute error of peak inspiratory flow prediction was 8.3 L / min, and the mean absolute error of breath-hold duration estimation was 0.6 s. Compared with traditional rule-matching methods, the action recognition accuracy of the present invention is improved by 21.5 percentage points, fully demonstrating the superiority of deep neural networks in inhaler operation recognition tasks.
[0099] See Figure 2 This invention provides an inhaled medication adherence monitoring system based on deep neural networks to implement the above-described method embodiments. In one embodiment, the overall system architecture includes five core functional modules: a sensor data acquisition module 1, a one-dimensional convolutional feature extraction module 2, a bidirectional temporal modeling module 3, a multi-task prediction module 4, and a medication guidance feedback module 5. Each module corresponds one-to-one with the five processing steps in the method embodiments, working together to complete the intelligent monitoring and evaluation task of inhaler medication adherence.
[0100] The sensor data acquisition module 1 is integrated into the inhaler and serves as the system's data entry component. Preferably, the hardware implementation of the sensor data acquisition module includes a triaxial MEMS accelerometer chip (range ±16g, sampling rate 100Hz), a differential pressure MEMS airflow sensor chip (range 0 to 300L / min, sampling rate 100Hz), a low-power microcontroller (for sensor data reading and preprocessing), and supporting signal conditioning circuitry. The software functions of the sensor data acquisition module 1 include sensor initialization configuration, synchronous sampling control, sliding window framing, data normalization, and other preprocessing operations, outputting a standardized 4-channel 256-point timing input tensor for use by subsequent modules.
[0101] The one-dimensional convolutional feature extraction module 2 is used to receive standardized multi-channel temporal input data output from the sensor data acquisition module and perform forward inference computation of the one-dimensional convolutional neural network. In one embodiment of the present invention, the one-dimensional convolutional feature extraction module 2 can be deployed on the neural network acceleration chip built into the inhaler using an edge computing scheme, or it can be deployed on the CPU / GPU of the user's mobile phone using a mobile computing scheme. Preferably, considering the power consumption and cost constraints of the inhaler, the present invention adopts a mobile computing scheme, where the inhaler is only responsible for data acquisition and transmission, and the neural network inference is completed on the mobile phone. The network structure of the one-dimensional convolutional feature extraction module 2 is as described in step S2 of the method embodiment, including a three-layer cascaded one-dimensional convolutional layer, a batch normalization layer, a ReLU activation layer, and a max pooling layer, outputting a 256-dimensional temporal feature vector with 32 time steps.
[0102] The bidirectional temporal modeling module 3 receives the temporal feature vector output by the one-dimensional convolutional feature extraction module 2 and performs forward inference computation using a bidirectional LSTM. As described in step S3 of the method embodiment, the bidirectional temporal modeling module includes a forward LSTM network and a backward LSTM network, each with a hidden layer dimension of 128 and a stacking layer of 2. The forward and backward hidden states are concatenated to output a 256-dimensional temporal pattern feature. The bidirectional temporal modeling module 3 can effectively capture the bidirectional temporal dependencies of action sequences, providing context-rich feature representations for subsequent multi-task prediction.
[0103] The multi-task prediction module 4 receives the temporal pattern features output by the bidirectional temporal modeling module 3 and performs shared feature extraction and multi-task parallel prediction computation. As described in step S4 of the method embodiment, the multi-task prediction module 4 includes a shared feature extractor (a two-layer fully connected network) and three task-specific prediction heads (a classification head, a peak flow regression head, and a breath-hold duration estimation head). The multi-task prediction module 4 outputs the probability of five-class action types, the predicted peak inspiratory flow rate, and the estimated breath-hold duration, and calculates the normativity score and overall medication quality rating based on these outputs.
[0104] Medication guidance feedback module 5 receives the evaluation results output by the multi-task prediction module and presents feedback information to the user through a human-computer interaction interface. The medication guidance feedback module 5 is implemented through a mobile application and a doctor-side cloud platform. The mobile application is responsible for generating real-time voice correction guidance and visual medication reports, including interface components such as action sequence diagrams, standardization radar charts, historical trend charts, and improvement suggestion lists. The cloud platform is responsible for storing historical medication data, generating statistical analysis reports, pushing early warning notifications, and supporting remote communication between doctors and patients, forming a complete closed loop for remote supervision and management of medication adherence.
[0105] In the system embodiment of the present invention, the data flow relationship between each module is consistent with the input-output correspondence of each step in the method embodiment. The modules work together to realize the end-to-end intelligent processing flow from the original sensor signal to the feedback of drug quality assessment.
[0106] The embodiments of the present invention are not limited to the specific embodiments described above. Those skilled in the art can make various equivalent changes or substitutions based on the technical solutions of the present invention, and all such changes or substitutions should be included within the protection scope of the present invention.
Claims
1. A method for monitoring inhaled medication adherence based on deep neural networks, characterized in that, Includes the following steps: Step S1: Sensor data acquisition. Through the triaxial accelerometer and microcomputer electro-current sensor integrated on the inhaler, motion posture data and inspiratory flow curve during medication are acquired in real time. The acquired sensor data is preprocessed by sliding window frame division to generate standardized multi-channel time-series input data. Step S2: One-dimensional convolutional feature extraction. The standardized multi-channel temporal input data obtained in step S1 is input into a one-dimensional convolutional neural network. Local features are extracted from the sensor temporal signal through multiple cascaded one-dimensional convolutional layers. The signal pattern features of five action units, namely shaking, medication application, exhalation, inhalation and breath-holding, are identified to generate a temporal feature vector. Step S3: Bidirectional temporal modeling. The temporal feature vector obtained in step S2 is input into a bidirectional long short-term memory network to perform bidirectional temporal modeling on the complete medication action sequence. The forward network learns the forward evolution pattern of the action sequence, and the backward network learns the reverse dependency relationship of the action sequence. The forward hidden state and the backward hidden state are concatenated and fused to learn the temporal pattern differences between correct and incorrect operations and generate temporal pattern features. Step S4: Multi-task joint prediction. The temporal pattern features obtained in step S3 are input into a multi-task deep neural network. The computational overhead is reduced by sharing the underlying feature extractor. At the same time, three objectives are predicted: action type classification, inspiratory peak flow regression, and breath-holding duration estimation. Based on the output results of the three prediction objectives, the normative score of each action unit and the overall medication quality rating are calculated. Step S5: Medication guidance feedback. The standardization score and overall medication quality rating obtained in step S4 are transmitted to the mobile application via Bluetooth. The application generates real-time voice correction guidance and a visual medication report based on the standardization score. The data is synchronized to the doctor's cloud platform to realize remote and precise supervision and management of medication adherence.
2. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S1, the sampling frequency of the triaxial accelerometer is 100Hz and the range is ±16g; the sampling frequency of the microcomputer-controlled electro-current sensor is 100Hz and the range is 0 to 300L / min; the window length of the sliding window frame preprocessing is 256 sampling points and the window sliding step is 64 sampling points.
3. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S2, the one-dimensional convolutional neural network includes three cascaded one-dimensional convolutional layers. The kernel size of the first one-dimensional convolutional layer is 3, the kernel size of the second one-dimensional convolutional layer is 5, and the kernel size of the third one-dimensional convolutional layer is 7. Each one-dimensional convolutional layer is followed by a batch normalization layer, a ReLU activation function layer, and a max pooling layer in sequence.
4. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S3, the forward hidden layer dimension of the bidirectional long short-term memory network is 128, the backward hidden layer dimension is 128, and the feature dimension after concatenating the forward and backward hidden states is 256; the bidirectional long short-term memory network has 2 layers.
5. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S2, the specific method for identifying the signal pattern characteristics of the five action units—shaking, medication application, exhalation, inhalation, and breath-holding—is as follows: Shaking motion recognition extracts periodic reciprocating motion features from acceleration signals using the one-dimensional convolutional neural network. These periodic reciprocating motion features are characterized by alternating positive and negative changes in the three-axis acceleration components. The medication application action recognition uses the one-dimensional convolutional neural network to extract posture adjustment features from the acceleration signal. These posture adjustment features are represented by the acceleration change sequence generated during the transition of the inhaler from a stationary state to an inhalation posture. Exhalation action recognition extracts negative airflow features from the airflow sensor signal through the one-dimensional convolutional neural network. The negative airflow features are the reverse airflow signal generated when the user exhales into the inhaler. Inhalation action recognition is achieved by extracting positive peak airflow features from the airflow sensor signal through the one-dimensional convolutional neural network. These positive peak airflow features are represented by the rising edge and peak point of the flow rate curve during the inhalation process. Breath-holding motion recognition extracts steady-state near-zero flow characteristics from airflow sensor signals and stationary attitude maintenance characteristics from acceleration signals using the one-dimensional convolutional neural network.
6. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S4, the training of the multi-task deep neural network adopts a multi-task joint loss function, which is a weighted sum of classification loss, peak flow rate regression loss, and breath-holding duration regression loss; the classification loss adopts the cross-entropy loss function, and the peak flow rate regression loss and the breath-holding duration regression loss both adopt the mean square error loss function.
7. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S4, the normativity score is calculated based on the following evaluation dimensions: Action integrity score, which assesses whether all five action units have been executed; The correctness score for the sequence of actions assesses whether the execution order of the five action units conforms to the standard operating procedure. The peak inspiratory flow rate (PIF) achievement score assesses whether the peak inspiratory flow rate reaches the minimum effective threshold when an inspiratory action is detected. When an inspiratory action is not detected, the peak inspiratory flow rate achievement score is directly assigned a full score, and the penalty for missing inspiratory action is independently borne by the action integrity score. Breath-holding duration target score: When the breath-holding action is detected, the assessment is made as to whether the breath-holding duration has reached the shortest effective threshold; when the breath-holding action is not detected, the breath-holding duration target score is directly assigned a full score, and the penalty for missing breath-holding is borne independently by the action integrity score. The overall medication quality rating is divided into four levels: A, B, C, and D, based on the standardized scoring.
8. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S5, the real-time speech correction guidance includes: When a missing action is detected, the user is prompted to complete the missing action unit. When an incorrect sequence of actions is detected, the voice prompts the user to repeat the actions in the correct order. When the peak inspiratory flow rate is not detected to be up to standard, the voice prompts the user to increase the inhalation force. When insufficient breath-holding time is detected, the voice prompts the user to extend the breath-holding time.
9. The method for monitoring inhaled medication adherence based on deep neural networks according to claim 1, characterized in that, In step S5, the visualized medication report includes: The action sequence diagram shows the recognition results and time distribution of each action unit during this medication administration process; The standardized radar chart displays scores for four dimensions: completeness of action, correctness of sequence, peak flow rate compliance, and breath-holding duration compliance. The historical trend chart shows the recent trend of changes in users' medication quality ratings; A list of improvement suggestions is generated based on the shortcomings of this medication use, providing targeted recommendations for improvement.
10. A deep neural network-based inhaled medication adherence monitoring system, characterized in that, To implement the method according to any one of claims 1 to 9, comprising: The sensor data acquisition module, integrated into the inhaler, includes a triaxial accelerometer and a microcomputer-controlled electro-current sensor. It is used to acquire motion posture data and inspiratory flow curves in real time during medication administration, and to perform sliding window frame preprocessing on the acquired sensor data to generate standardized multi-channel time-series input data. The one-dimensional convolutional feature extraction module is used to receive standardized multi-channel time-series input data output by the sensor data acquisition module, and to extract local features from the sensor time-series signal through multi-layer cascaded one-dimensional convolutional layers, identify the signal pattern features of five action units: shaking, medication application, exhalation, inhalation and breath-holding, and generate a time-series feature vector. The bidirectional temporal modeling module is used to receive the temporal feature vector output by the one-dimensional convolutional feature extraction module, and to perform bidirectional temporal modeling on the complete medication action sequence through a bidirectional long short-term memory network, learn the temporal pattern differences between correct and incorrect operations, and generate temporal pattern features. The multi-task prediction module is used to receive the temporal pattern features output by the bidirectional temporal modeling module, and simultaneously predict three objectives—action type classification, inspiratory peak flow regression, and breath-hold duration estimation—by sharing the underlying feature extractor, and calculate the standardization score of each action unit and the overall medication quality rating. The medication guidance feedback module is used to receive the standardization score and overall medication quality rating output by the multi-task prediction module via Bluetooth, generate real-time voice correction guidance and visual medication reports, and synchronize the data to the doctor's cloud platform to realize remote and precise supervision and management of medication adherence.