A multi-modal physiological signal peak detection method based on U-Net
By combining feature extraction and adaptive thresholding with a U-Net-based method for peak detection of multimodal physiological signals, this method solves the problem that existing algorithms cannot be universally applied to multimodal physiological signals, and achieves efficient and accurate peak detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI JIAOTONG UNIV
- Filing Date
- 2024-06-06
- Publication Date
- 2026-06-23
AI Technical Summary
Existing physiological signal detection algorithms are usually designed for specific modes and cannot be universally applied to multimodal physiological signals, resulting in wasted hardware resources and increased power consumption.
A U-Net-based multimodal physiological signal peak detection method is adopted, which combines feature extraction, signal reconstruction and adaptive thresholding. Peak detection of ECG, neural electrical signals and PPG signals is achieved through a one-dimensional U-Net network model and wavelet decomposition reconstruction.
It achieves efficient and accurate peak detection of multimodal physiological signals, reduces the possibility of inaccurate reconstruction location and false positives and false negatives, and improves the efficiency and sensitivity of the algorithm.
Smart Images

Figure CN118747291B_ABST
Abstract
Description
[Technical Field]
[0001] This invention relates to the field of physiological signal detection technology, and more specifically to a method for detecting peak values of multimodal physiological signals based on U-Net. [Background Technology]
[0002] In pathological studies, drug development, and toxicology research, researchers need to assess the efficacy and toxicity of drugs. To more accurately evaluate the effects of drugs on key tissues such as the cardiovascular and nervous systems, researchers need to closely monitor a variety of physiological signals. Among these, electrocardiograms (ECGs) provide information about cardiac function, neural electrical signals provide information about the nervous system, and photoplethysmography (PPG) signals provide information about blood flow and oxygen saturation. These signals and their applications are crucial in medical research.
[0003] Processing these multimodal physiological signals requires an intelligent algorithm to simultaneously process and analyze them. Physiological signal segmentation is crucial and is often the first step in subsequent processing. By segmenting the signal into different time segments, physiological indicators within each segment can be analyzed more accurately, facilitating further assessment of the effects of drugs on key tissues. Peak values, as one of the most significant features of the signal, are typically used as the segmentation criterion. Therefore, developing an intelligent algorithm capable of simultaneously processing multimodal physiological signals (peak detection) and handling them in real time is of great importance.
[0004] Many works have been completed to address the needs of physiological signal processing. For example, Chinese patent document CN112635047A proposes a probability estimation module with an attention mechanism, which is trained using an ECG signal dataset with manually labeled R-peak positions to detect R-peaks in ECG signals. Another example is Chinese patent document CN115040089A, which uses a pulse oximeter to collect pulse wave signals from the fingertips and proposes a deep learning-based method and device for peak detection and classification of pulse waves.
[0005] However, these algorithms are typically designed for specific signals, with their structure and parameters tailored to particular applications, making it impossible to apply the developed algorithms to other signals. For multi-mode monitoring systems, using different types of algorithms to process different signals can significantly increase power consumption and waste hardware resources.
[0006] U-Net is a popular image segmentation network, originally designed for medical image segmentation. Its main characteristic is its U-shaped structure, which features a contracting path (encoder) and a symmetrical expanding path (decoder). The Encoder module, also known as the contracting path, can be understood as a feature extraction network. The Decoder, also known as the expanding module, can be understood as a feature fusion network, used to restore high-level abstract features to high resolution; simply put, it represents pixel-level semantic information in the original image.
[0007] This invention addresses the technical problem that existing algorithms are tailored to specific modal physiological signals and cannot be applied to other modal physiological signals, and makes technical improvements to the peak detection method for multimodal physiological signals. [Summary of the Invention]
[0008] The purpose of this invention is to provide a method for accurately locating the R peak of ECG, the peak value of neural spikes, and the contraction peak of PPG by using only one set of parameters, combining feature extraction, signal reconstruction, and adaptive thresholding.
[0009] To achieve the above objectives, the technical solution adopted by this invention is a multimodal physiological signal peak detection method based on U-Net, comprising the following steps:
[0010] S1. A dataset is constructed using multimodal physiological signals with manually labeled peaks, and divided into training and testing sets. A one-dimensional U-Net network model is then built and trained.
[0011] S2. Input the multimodal physiological signal data segment to be detected into the trained one-dimensional U-Net network model to obtain the multimodal physiological signal with enhanced detailed information;
[0012] S3. Perform wavelet decomposition and reconstruction on the multimodal physiological signal data segments to be detected, extract spectral domain features, and further enhance the multimodal physiological signals;
[0013] S4. Add the multimodal physiological signal enhanced by the one-dimensional U-Net network model in step S2 and the multimodal physiological signal enhanced by wavelet decomposition and reconstruction in step S3 to signal X. Perform adaptive threshold detection on signal X to determine the peak value and obtain the final multimodal physiological signal peak value result.
[0014] Preferably, the multimodal physiological signal is an ECG signal, a neural electrical signal, or a PPG signal.
[0015] Preferably, the one-dimensional U-Net network model takes time series data as input and uses one-dimensional convolution, including three parts: encoder, bottleneck, and decoder. The encoder and decoder parts each have two layers, and the bottleneck part has one layer. Each layer consists of several convolutional layers and pooling layers or upsampling layers. Each convolutional layer is followed by a layer normalization operation after the convolution operation, and the ReLU function is used as the activation function.
[0016] Preferably, the one-dimensional U-Net network model is:
[0017] In the encoder part, the length of the input data is L. The first layer consists of three 1×3 convolution operations, and the second layer includes one 1×2 max pooling operation and two 1×3 convolution operations. After the pooling operation, the data length changes from L to L / 2.
[0018] The bottleneck part is the third layer, which connects the encoder and decoder. It includes one 1×2 max pooling operation and two 1×3 convolution operations, and the data length changes from L / 2 to L / 4.
[0019] The decoder restores the data length to its original length L by upsampling. The second layer includes one 1×2 upsampling operation and two 1×3 convolution operations. The first layer includes one 1×2 upsampling operation, two 1×3 convolution operations and one 1×1 convolution operation. After the upsampling operation, the data length is doubled, and finally a data of length L is output.
[0020] Preferably, step S3 involves using discrete wavelet transform to decompose the multimodal physiological signal into several scales, including low-frequency coefficients cAi and high-frequency coefficients cDi. The low-frequency coefficients cAi reflect the slowly changing signal caused by low-frequency interference, and the high-frequency coefficients cDi reflect the detailed information of the multimodal physiological signal. During the reconstruction of the multimodal physiological signal, two high-frequency coefficients are selected, and the low-frequency coefficients are set to 0 to eliminate the influence of baseline drift and high-frequency noise. Finally, two high-frequency coefficients and one low-frequency coefficient are selected as scales for each modality of physiological signal for reconstruction.
[0021] Preferably, step S4 includes the following sub-steps:
[0022] S41. Squaring X eliminates the possibility of missed detection due to opposite polarities, resulting in X0. 2 ;
[0023] S42, X 2 Divide the data into N segments, namely H1, H2, ..., H... N For the i-th segment, find all peak values greater than the threshold th1, where th1 is defined as 1.2 times the average value of the i-th segment, i.e., th1 = 1.2 * mean(H). i ); Traverse N data segments;
[0024] S43. Compare all the peaks found in step S43 in pairs. If the interval is less than 40 / f, where f is the sampling frequency of signal X, then eliminate the peaks with smaller amplitudes and obtain the candidate peaks.
[0025] S44. Using the candidate peak obtained in step S43 as the dividing point, X 2 Divide into M segments, namely L1, L2, ..., L M For segment j, find all peak values greater than the threshold th2, where th2 is defined as 0.6 times the average value of segment j, i.e., th2 = 0.6 * mean(L). j ); Traverse M segments of data;
[0026] S45. Compare all the peaks found in steps S42 and S44 in pairs. If the interval is less than 40 / f, eliminate the peaks with smaller amplitudes and finally obtain the predicted peaks.
[0027] The beneficial effects of the multimodal physiological signal peak detection method based on U-Net of this invention are as follows: 1. Peak detection is performed on ECG signals, neural electrical signals, and PPG signals to handle the diversity of multimodal physiological signals. Considering the temporal and morphological characteristics of the signals, wavelet filters are used to extract the temporal characteristics of the three types of signals, i.e., the spectrum-related features, so as to extract the spectral domain features during signal reconstruction; 2. Morphological features are learned through a one-dimensional U-Net, which can enhance the detailed information in the signal and minimize the possibility of inaccurate peak positions in the reconstruction; 3. The efficiency advantages of rule-based algorithms and the sensitivity advantages of artificial intelligence methods can be combined to ensure the high efficiency of the algorithm while minimizing the possibility of false detection and missed detection of peaks. [Attached Image Description]
[0028] Figure 1 This is a schematic diagram of a multimodal physiological signal peak detection method based on U-Net.
[0029] Figure 2 This is a schematic diagram of the U-Net network used in this invention.
[0030] Figure 3 This is a flowchart illustrating the principle of the adaptive threshold method of this invention.
Detailed Implementation Methods
[0031] The features and exemplary embodiments of various aspects of the present invention will now be described in detail. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be apparent to those skilled in the art that the invention may be practiced without requiring some of these specific details. The following description of embodiments is merely intended to provide a better understanding of the invention by illustrating examples of the invention. The invention is by no means limited to any specific configurations and algorithms presented below, but covers any modifications, substitutions, and improvements to elements, components, and algorithms without departing from the inventive concept. In the accompanying drawings and the following description, well-known structures and techniques are not shown in order to avoid unnecessarily obscuring the invention.
[0032] Example
[0033] This embodiment implements a method for detecting peak values of multimodal physiological signals based on U-Net.
[0034] Figure 1 This is a schematic diagram of a multimodal physiological signal peak detection method based on U-Net. (See diagram for example.) Figure 1 As shown, a method for detecting peak values of multimodal physiological signals based on U-Net includes the following steps:
[0035] S1. A dataset is constructed using ECG signals, neural electrical signals, and PPG signals with manually labeled peak values, and the data is divided into a training set and a test set.
[0036] S2. Build and train a 3-layer one-dimensional U-Net network model. Input the physiological signal data segment to be detected into the trained network model to obtain the corresponding output signal (obtain the signal with enhanced detailed information).
[0037] A three-layer one-dimensional U-Net network model was built and trained, using time series data as input, thus employing one-dimensional convolution. Figure 2 This is a schematic diagram of the U-Net network used in this invention. (See diagram below.) Figure 2 As shown, a three-layer one-dimensional U-Net network model was constructed and trained, comprising an encoder, a bottleneck, and a decoder. The encoder and decoder each have two layers, while the bottleneck has one layer. Each layer consists of several convolutional layers and pooling or upsampling layers. After passing through the bottleneck, the decoder concatenates the upsampled result from the previous layer with the corresponding result from the encoder to recover the features. Finally, a convolution operation with a kernel size of 1 is performed to obtain the output signal. The data length is halved after each max pooling operation and doubled after each upsampling operation. The model was trained for 600 epochs with an initial learning rate of 0.001, using AdamW as the optimizer. The cross-entropy loss function was used as the evaluation metric. Finally, the trained model's network parameters were saved.
[0038] In the encoder section, the input data length is L. The first layer consists of three 1×3 convolution operations, and the second layer includes one 1×2 max pooling operation and two 1×3 convolution operations. After the pooling operations, the data length changes from L to L / 2.
[0039] The bottleneck is the third layer, which connects the encoder and decoder. It includes one 1×2 max pooling operation and two 1×3 convolution operations, at which point the data length becomes L / 4.
[0040] The decoder restores the data to its original length through upsampling. The second layer consists of one 1×2 upsampling operation and two 1×3 convolution operations, while the first layer consists of one 1×2 upsampling operation, two 1×3 convolution operations, and one 1×1 convolution operation. After the upsampling operation, the data length is doubled, resulting in a final output of length L.
[0041] In this network, each 1×3 convolution operation is followed by a layer normalization operation, and the ReLU function is used as the activation function.
[0042] Finally, the physiological signal data segment to be detected is input into the trained network model, and the corresponding output signal is obtained.
[0043] S3. Perform wavelet decomposition and reconstruction on the physiological signal data segment to be detected to further enhance the signal;
[0044] Wavelet Feature Extraction: Using Discrete Wavelet Transform (DWT), the ECG, neural electrical, and PPG signals were decomposed into several scales, including low-frequency and high-frequency coefficients. The low-frequency coefficients (cAi) reflect slowly changing signals caused by low-frequency interference, such as baseline drift, while the high-frequency coefficients (cDi) reflect more detailed information. During reconstruction, only two high-frequency coefficients were selected, and the low-frequency coefficients were set to 0 to eliminate the effects of baseline drift and high-frequency noise. Finally, three scales (two high-frequency coefficients and one low-frequency coefficient) were selected for reconstruction of each signal.
[0045] Using the Symlet wavelet as the wavelet basis, the ECG signal, neural electrical signal, and PPG signal are decomposed into 9, 9, and 6 scales, respectively. During reconstruction, the low-frequency coefficients (cA) are... i The value is set to 0 to eliminate low-frequency noise, such as baseline drift. Furthermore, high-frequency coefficients of different scales are selected based on the characteristics of different signals, ultimately reconstructing the data into X. ECG =(cD4,cD5,cA9),X 神经电信号 =(cD3,cD4,cA9) and X PPG = (cD2, cD3, cA6).
[0046] S4. Add the corresponding signals obtained from S2 (the signal enhanced by the U-Net network) and S3 (the result obtained from wavelet decomposition and reconstruction), perform adaptive threshold detection on them, determine the peak value, and thus obtain the final result.
[0047] The corresponding signals obtained from S2 and S3 are added together to obtain signal X, with a sampling frequency of f Hz. Adaptive threshold detection is performed on X to determine peak values. Specifically, the reconstructed signal is first squared and divided into N equal-length segments. Then, peak values with amplitudes greater than a threshold th1 are identified within these N segments, where th1 is defined as 1.2 times the average value of the segment. All selected peak values are considered preliminarily qualified. Based on this, points that are too close together are eliminated to obtain candidate peak values. Finally, the squared reconstructed signal is further divided according to the positions of the candidate peak values, and missed peaks are identified in the newly divided segments, but with a smaller amplitude threshold th2, where th2 is defined as 0.6 times the average value of the segment.
[0048] Figure 3 This is a flowchart illustrating the principle of the adaptive threshold method of this invention. Figure 3 As shown, step S4 includes the following sub-steps:
[0049] S4-1: Squaring X eliminates the possibility of missed detections due to opposite polarities, resulting in X0. 2 ;
[0050] S4-2: X 2 Divide the data into N segments, namely H1, H2, ..., H... N ;
[0051] S4-3: For the i-th segment, find all peak values greater than the threshold th1, where th1 is defined as 1.2 times the average value of the segment, i.e., th1 = 1.2 * mean(H i Traverse N data segments;
[0052] S4-4: Compare all the peaks found in S4-3 pairwise. If the interval is less than 40 / f, eliminate the peaks with smaller amplitudes and then obtain the candidate peaks.
[0053] S4-5: Using the candidate peak obtained in S4-4 as the dividing point, X 2 Divide into M segments, namely L1, L2, ..., L M Since the segmentation method here is no longer equal, each segment is not necessarily of equal length.
[0054] S4-6: For segment j, find all peak values greater than the threshold th2, where th2 is defined as 0.6 times the average value of the segment, i.e., th2 = 0.6 * mean(L j Traverse M segments of data;
[0055] S4-7: Compare all the peaks found in S4-5 and S4-6 pairwise. If the interval is less than 40 / f, eliminate the peaks with smaller amplitudes and finally obtain the predicted peaks.
[0056] To demonstrate the effectiveness and superiority of this embodiment, ECG signal data from the MIT-BIH database, neural electrical signal data from the Quiroga simulation database, and PPG signal data from the MIMIC database are used for evaluation. The results are shown in Table 1.
[0057] Table 1 Evaluation Table of Peak Detection Data for Multimodal Physiological Signals
[0058] Physiological signal modes accuracy Recall rate F1 score ECG 0.9903 0.9934 0.9918 neural electrical signals 0.9881 0.9814 0.9847 PPG 0.9938 0.9963 0.9950
[0059] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM).
[0060] The above description is only a preferred embodiment of the present invention. It should be noted that those skilled in the art can make several improvements and additions without departing from the principle of the present invention, and these improvements and additions should also be considered within the scope of protection of the present invention.
Claims
1. A method for detecting peak values of multimodal physiological signals based on U-Net, characterized in that... Includes the following steps: S1. A dataset is constructed using multimodal physiological signals with manually labeled peaks, and divided into training and testing sets. A one-dimensional U-Net network model is then built and trained. S2. Input the multimodal physiological signal data segment to be detected into the trained one-dimensional U-Net network model to obtain the multimodal physiological signal with enhanced detailed information; S3. Perform wavelet decomposition and reconstruction on the multimodal physiological signal data segments to be detected, extract spectral domain features, and further enhance the multimodal physiological signals; S4. Add the multimodal physiological signal enhanced by the one-dimensional U-Net network model in step S2 and the multimodal physiological signal enhanced by wavelet decomposition and reconstruction in step S3 to signal X. Perform adaptive threshold detection on signal X to determine the peak value and obtain the final multimodal physiological signal peak value result.
2. The method for detecting peak values of multimodal physiological signals based on U-Net according to claim 1, characterized in that: The multimodal physiological signals are ECG signals, neural electrical signals, or PPG signals.
3. The method for detecting peak values of multimodal physiological signals based on U-Net according to claim 2, characterized in that: The one-dimensional U-Net network model takes time series data as input and uses one-dimensional convolution. It includes three parts: encoder, bottleneck, and decoder. The encoder and decoder parts each have two layers, and the bottleneck part has one layer. Each layer consists of several convolutional layers and pooling or upsampling layers. After the convolution operation of each convolutional layer, there is a layer normalization operation, and the ReLU function is used as the activation function.
4. The method for detecting peak values of multimodal physiological signals based on U-Net according to claim 3, characterized in that... The one-dimensional U-Net network model: In the encoder part, the length of the input data is L. The first layer consists of three 1×3 convolution operations, and the second layer includes one 1×2 max pooling operation and two 1×3 convolution operations. After the pooling operation, the data length changes from L to L / 2. The bottleneck part is the third layer, which connects the encoder and decoder. It includes one 1×2 max pooling operation and two 1×3 convolution operations, and the data length changes from L / 2 to L / 4. The decoder restores the data length to its original length L by upsampling. The second layer includes one 1×2 upsampling operation and two 1×3 convolution operations. The first layer includes one 1×2 upsampling operation, two 1×3 convolution operations and one 1×1 convolution operation. After the upsampling operation, the data length is doubled, and finally a data of length L is output.
5. The method for detecting peak values of multimodal physiological signals based on U-Net according to claim 2, characterized in that... Step S3: The multimodal physiological signal is decomposed into several scales using discrete wavelet transform, including low-frequency coefficients cAi and high-frequency coefficients cDi. The low-frequency coefficients cAi are used to reflect the slowly changing signal caused by low-frequency interference, and the high-frequency coefficients cDi are used to reflect the detailed information of the multimodal physiological signal. During the reconstruction of the multimodal physiological signal, two high-frequency coefficients are selected and the low-frequency coefficients are set to 0 to eliminate the influence of baseline drift and high-frequency noise. Finally, two high-frequency coefficients and one low-frequency coefficient are selected as scales for each modality of physiological signal for reconstruction.
6. The method for detecting peak values of multimodal physiological signals based on U-Net according to claim 5, characterized in that... Step S4 includes the following sub-steps: S41. Squaring X eliminates the possibility of missed detection due to opposite polarities, resulting in X0. 2 ; S42, X 2 Divide the data into N segments, namely H1, H2, ..., H... N For the i-th segment, find all peak values greater than the threshold th1, where th1 is defined as 1.2 times the average value of the i-th segment, i.e., th1 = 1.2 * mean(H). i ); Traverse N data segments; S43. Compare all the peaks found in step S43 in pairs. If the interval is less than 40 / f, where f is the sampling frequency of signal X, then eliminate the peaks with smaller amplitudes and obtain the candidate peaks. S44. Using the candidate peak obtained in step S43 as the dividing point, X 2 Divide into M segments, namely L1, L2, ..., L M For segment j, find all peak values greater than the threshold th2, where th2 is defined as 0.6 times the average value of segment j, i.e., th2 = 0.6 * mean(L). j ); Traverse M segments of data; S45. Compare all the peaks found in steps S42 and S44 in pairs. If the interval is less than 40 / f, eliminate the peaks with smaller amplitudes and finally obtain the predicted peaks.