A Deep Learning-Based Non-Contact Physiological Indicator Detection Method and System
By extracting human pulse wave signals from video images using deep learning methods and deep convolutional neural networks, the problem of discomfort associated with traditional detection methods is solved. This enables non-contact comprehensive detection of heart rate, blood oxygen saturation, and blood pressure, providing convenient and accurate physiological indicator detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YIBIN MICRO INTELLIGENT TECH CO LTD
- Filing Date
- 2023-07-28
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional methods for detecting physiological indicators require skin contact, which can be uncomfortable. Furthermore, current technologies cannot achieve non-contact comprehensive detection of multiple physiological indicators such as heart rate, blood oxygen saturation, and blood pressure.
A deep learning-based approach is adopted to extract human pulse wave signals from video images using deep convolutional neural networks. A camera acquisition system and physiological indicator detection platform are constructed, including visible light and infrared cameras. Heart rate, blood oxygen saturation, and blood pressure feature vectors are segmented and extracted by deep neural networks through feature box detection. Pulse wave signal analysis is then performed in conjunction with spatiotemporal deep convolutional neural networks.
It enables non-contact detection under various lighting conditions, accurately identifies heart rate, blood oxygen saturation, and blood pressure, provides a convenient and comfortable testing experience, has high robustness, and can comprehensively detect multiple physiological indicators.
Smart Images

Figure CN117204828B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of physiological indicator feature extraction and processing based on video images, and in particular to a non-contact physiological indicator detection method and system based on deep learning. Background Technology
[0002] People are increasingly focused on their personal health, especially changes in physiological indicators such as heart rate, blood oxygen saturation, and blood pressure. However, traditional detection methods require electrodes to be attached to the skin or sensors to be worn, causing significant discomfort to the test subject. This limits the sustainability of health indicator monitoring, while long-term monitoring and accurate assessment of these physiological indicators are crucial for health management. Non-contact physiological indicator detection based on machine vision has become a research focus in the field of computer vision. This method is characterized by high efficiency, non-invasiveness, comfort, convenience, and remote detection, and is widely used in many scenarios such as medical monitoring, emotion recognition, and fatigue detection. In particular, non-contact physiological indicator detection methods show significant advantages for people with sensitive skin, infants, and burn patients. Currently, heart rate monitoring bracelets and similar devices can only achieve single functions (such as heart rate detection) and cannot achieve comprehensive detection of physiological indicators such as heart rate, blood oxygen saturation, and blood pressure. Existing technologies urgently need non-contact detection technologies for comprehensive physiological indicators. Summary of the Invention
[0003] The purpose of this invention is to solve the technical problems pointed out in the background art and to provide a non-contact physiological indicator detection method and system based on deep learning. It uses a deep convolutional neural network to extract human pulse wave signals, adapts to video image input under various lighting conditions, and can accurately identify the feature vectors of heart rate, blood oxygen saturation, and blood pressure, and has strong robustness.
[0004] The objective of this invention is achieved through the following technical solution:
[0005] A non-contact physiological indicator detection method based on deep learning, the method comprising:
[0006] S1. Construct a camera acquisition system, including visible light and infrared camera components, to acquire image data including video images and infrared video images. Construct a physiological indicator detection platform system that communicates with the camera acquisition system. This platform system includes a deep neural network for bounding box detection, a neural network processing unit for heart rate detection, a neural network processing unit for blood oxygen saturation detection, and a neural network processing unit for blood pressure detection. The deep neural network for bounding box detection is trained on sample data using bounding box segmentation and feature vector extraction. The deep neural network for bounding box detection performs heart rate bounding box segmentation and heart rate feature vector extraction on the image data and inputs it to the heart rate detection neural network processing unit. The deep neural network for bounding box detection performs blood oxygen saturation bounding box segmentation and blood oxygen saturation feature vector extraction on the image data and inputs it to the blood oxygen saturation detection neural network processing unit. The deep neural network for bounding box detection performs blood pressure bounding box segmentation and blood pressure feature vector extraction on the image data and inputs it to the blood pressure detection neural network processing unit.
[0007] S2. The heart rate detection neural network processing unit extracts and identifies the heart rate feature vector to simulate the pulse wave signal and obtains the time series spectrum of the human pulse wave. It performs spectrum analysis on the time series spectrum and selects the frequency corresponding to the peak of the spectrum as the heart rate detection value A. It calculates the average value of the heart rate detection value A in the T1 time period as the heart rate detection output value.
[0008] The blood oxygen saturation detection neural network processing unit extracts and identifies the blood oxygen saturation feature vector and simulates the pulse wave signal. The pulse wave signal is then processed by separating the DC and AC components as follows:
[0009] Where I represents the amplitude at the peak of the pulse wave signal, and ΔI represents the difference between the amplitude at the peak and the amplitude at the trough of the pulse wave.
[0010] Blood oxygen saturation SpO2 was obtained using the following method:
[0011] Where A and B are calibration parameters, AC R DC represents the AC component of the visible light pulse wave signal. R AC represents the DC component of the visible light pulse wave signal. IR DC represents the AC component of the pulse wave signal under infrared light. IR This represents the DC component of the pulse wave signal under infrared light.
[0012] The blood pressure detection neural network processing unit is trained using sample data of systolic and diastolic blood pressure corresponding to the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the average amplitude and frequency of the pulse wave under infrared light. It extracts and identifies blood pressure feature vectors to simulate pulse wave signals and obtains the time series spectrum of human pulse waves. It performs spectral analysis on the time series spectrum to obtain four feature parameters: average amplitude and frequency of the pulse wave under visible light, average amplitude and frequency of the pulse wave under infrared light, and outputs blood pressure data including systolic and diastolic blood pressure.
[0013] The physiological indicator detection platform system outputs heart rate, blood oxygen saturation, and blood pressure data.
[0014] To better implement this invention, in step S1, the deep neural network for feature box detection acquires image data to form an image pyramid, and the model for feature box segmentation and feature vector extraction is trained according to the following method:
[0015] S11. Construct a heart rate feature sample dataset. The heart rate feature sample data in the dataset includes sample image data, heart rate feature boxes, and several heart rate feature vectors. The heart rate feature box is the region that expands outward from the center of the heart rate feature vector set and covers more than 90% of the heart rate feature vectors. The heart rate feature vectors are extracted using the following method:
[0016] S111. Extract a 12×12×3 image from the top layer of the image pyramid as input. The feature box detection deep neural network uses the ReLU function as the activation function of each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 5×5×10 feature map. After passing through two 3×3 convolutional layers, a 1×1×32 feature output is obtained, realizing the initial heart rate feature box and heart rate feature vector.
[0017] S112. Extract a 24×24×3 image from the second layer of the image pyramid as input. The feature box detection deep neural network uses the ReLU function as the activation function of each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate an 11×11×28 feature map. Then, feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 4×4×48 feature map. Then, feature extraction is performed through a 2×2 convolutional layer to generate a 3×3×64 feature map. Finally, the feature map is flattened through a fully connected layer to obtain a heart rate feature vector of length 128, and the corrected heart rate feature box and heart rate feature vector are obtained.
[0018] S113. Extract a 48×48×3 image from the third layer of the image pyramid as input. The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 23×23×32 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 10×10×64 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 4×4×64 feature map. A 2×2 convolutional layer generates a 3×3×128 feature map. The feature map is then flattened through a fully connected layer to obtain a feature vector of length 128, resulting in the final heart rate feature box and heart rate feature vector.
[0019] S12. Construct a blood oxygen saturation feature sample dataset. The blood oxygen saturation feature sample data in the blood oxygen saturation feature sample dataset includes sample image data, blood oxygen saturation feature boxes, and several blood oxygen saturation feature vectors. The blood oxygen saturation feature box is the area that expands outward from the center of the concentrated area of blood oxygen saturation feature vectors and covers more than 90% of the blood oxygen saturation feature vectors. The blood oxygen saturation feature boxes and blood oxygen saturation feature vectors are obtained according to the methods in steps S111 to S113.
[0020] Construct a blood pressure feature sample dataset. The blood pressure feature sample data in the blood pressure feature sample dataset includes sample image data, blood pressure feature boxes, and several blood pressure feature vectors. The blood pressure feature box is the area that expands outward from the center of the concentrated area of blood pressure feature vectors and covers more than 90% of the blood pressure feature vectors. The blood pressure feature boxes and blood pressure feature vectors are obtained according to the methods in steps S111 to S113.
[0021] A further preferred technical solution is as follows: In step S2, the physiological indicator detection platform system internally constructs a spatiotemporal deep convolutional neural network for pulse wave signal extraction. The spatiotemporal deep convolutional neural network trains the heart rate feature vector, blood oxygen saturation feature vector, or blood pressure feature vector as follows:
[0022] S21. Construct a sample feature dataset containing consecutive frame images. The sample feature data in the dataset includes sample feature images and pulse wave signals. Train the following correlation feature model on the sample feature data:
[0023] The spatiotemporal deep convolutional neural network uses the ReLU function as the activation function for each convolutional layer. It generates an N×72×72×32 feature map through a spatiotemporal modeling module and a 3×3 convolutional layer, followed by an N×70×70×32 feature map through another spatiotemporal modeling module and a 3×3 convolutional layer. Then, it generates an N×35×35×32 feature map through a 2×2 average pooling layer and a Dropout layer with a loss ratio of 0.25. Finally, it... A spatiotemporal modeling module and a 3×3 convolutional layer generate an N×35×35×64 feature map. Then, another spatiotemporal modeling module and a 3×3 convolutional layer generate an N×33×33×64 feature map. Next, a 2×2 average pooling layer and a Dropout layer with a loss ratio of 0.5 generate an N×16×16×64 feature map. Finally, a fully connected layer flattens the feature map to obtain a feature vector of length 128. The pulse wave signal is then output.
[0024] The spatiotemporal modeling module of the spatiotemporal deep convolutional neural network constructs a spatiotemporal sequence of consecutive frame images and finally outputs the pulse wave signal in the time series.
[0025] A more preferred technical solution is that the pulse wave signal in the sample feature data of the sample feature dataset is the true value of the photoplethysmography pulse wave signal corresponding to the sample feature image, measured by a Contec CMS50E pulse oximeter.
[0026] Preferably, in step S2, the time series spectrum is obtained as follows:
[0027] The pulse wave signal was filtered using a Butterworth bandpass filter with cutoff frequencies of 0.6 Hz and 3.0 Hz, preserving waveforms within the frequency range of [0.6 Hz, 3.0 Hz]. The filtered pulse wave signal was then subjected to a fast Fourier transform to obtain the time-series spectrum of the human pulse wave.
[0028] Preferably, the sample data of the blood pressure detection neural network processing unit includes sample feature data such as the average amplitude of the pulse wave under visible light, the average frequency of the pulse wave under visible light, the average amplitude of the pulse wave under infrared light, and the average frequency of the pulse wave under infrared light, as well as the corresponding systolic and diastolic blood pressures. The systolic and diastolic blood pressures are the true values of the systolic and diastolic blood pressures corresponding to the sample data, measured using an Omron U724J blood pressure monitor.
[0029] A deep learning-based non-contact physiological indicator detection system includes a camera acquisition system and a physiological indicator detection platform system that is communicatively connected to the camera acquisition system. The camera acquisition system includes a visible light camera and an infrared camera component. The physiological indicator detection platform system includes a feature box detection deep neural network, a heart rate detection neural network processing unit, a blood oxygen saturation detection neural network processing unit, and a blood pressure detection neural network processing unit.
[0030] The deep neural network for bounding box detection is trained on sample data after bounding box segmentation and feature vector extraction. The deep neural network then performs heart rate bounding box segmentation and heart rate feature vector extraction on the image data and inputs the results into the heart rate detection neural network processing unit. Similarly, the deep neural network performs blood oxygen saturation bounding box segmentation and blood oxygen saturation feature vector extraction on the image data and inputs the results into the blood oxygen saturation detection neural network processing unit. Finally, the deep neural network performs blood pressure bounding box segmentation and blood pressure feature vector extraction on the image data and inputs the results into the blood pressure detection neural network processing unit.
[0031] The heart rate detection neural network processing unit is used to extract and identify the heart rate feature vector, simulate the pulse wave signal, and obtain the time series spectrum of the human pulse wave. It performs spectral analysis on the time series spectrum and selects the frequency corresponding to the peak of the spectrum as the heart rate detection value A. It calculates the average value of the heart rate detection value A in the time period T1 as the heart rate detection output value.
[0032] The blood oxygen saturation detection neural network processing unit extracts and identifies the blood oxygen saturation feature vector and simulates the pulse wave signal. The pulse wave signal is then processed by separating the DC and AC components, and the blood oxygen saturation is calculated.
[0033] The blood pressure detection neural network processing unit is trained using sample data of systolic and diastolic blood pressure corresponding to the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the average amplitude and frequency of the pulse wave under infrared light. It extracts and identifies blood pressure feature vectors to simulate pulse wave signals and obtains the time series spectrum of human pulse waves. It performs spectral analysis on the time series spectrum to obtain four feature parameters: average amplitude and frequency of the pulse wave under visible light, average amplitude and frequency of the pulse wave under infrared light, and outputs blood pressure data including systolic and diastolic blood pressure.
[0034] The physiological indicator detection platform system is also used to output heart rate, blood oxygen saturation, and blood pressure data.
[0035] Compared with the prior art, the present invention has the following advantages and beneficial effects:
[0036] This invention employs a camera-based non-contact detection method, eliminating the need for direct contact with the test subject's body and providing a more convenient and comfortable testing experience. It utilizes a deep convolutional neural network to extract human pulse wave signals, adapting to video image input under various lighting conditions and accurately identifying feature vectors of heart rate, blood oxygen saturation, and blood pressure, demonstrating strong robustness. Compared to traditional single-physiological-indicator detection methods, this invention comprehensively detects multiple physiological indicators, including heart rate, blood oxygen saturation, and blood pressure, providing more comprehensive physiological information and increasing its application value and practicality. Attached Figure Description
[0037] Figure 1 This is a flowchart of the method for non-contact physiological indicator detection based on deep learning in the embodiment;
[0038] Figure 2 This is a flowchart illustrating the heart rate feature vector extraction method in this embodiment.
[0039] Figure 3 This is a block diagram illustrating the principle structure of the deep learning-based non-contact physiological indicator detection system in the embodiment.
[0040] Figure 4 This is a schematic diagram illustrating the feature extraction principle of the top layer of the image pyramid using a deep neural network for feature box detection in this embodiment.
[0041] Figure 5 This is a schematic diagram illustrating the feature extraction principle of the second layer of the image pyramid using a deep neural network for bounding box detection in this embodiment.
[0042] Figure 6 This is a schematic diagram illustrating the feature extraction principle of the third layer of the image pyramid using a deep neural network for bounding box detection in this embodiment.
[0043] Figure 7 This is a schematic diagram illustrating the structural principle of the spatiotemporal modeling module in the spatiotemporal deep convolutional neural network in this embodiment.
[0044] Figure 8 This is a schematic diagram illustrating the principle of the spatiotemporal deep convolutional neural network in the embodiment. Detailed Implementation
[0045] The present invention will be further described in detail below with reference to embodiments:
[0046] Example
[0047] like Figures 1 to 8 As shown, a non-contact physiological indicator detection method based on deep learning, such as... Figure 1 As shown, the method includes:
[0048] S1. Construct a camera acquisition system, including a visible light camera and an infrared camera assembly (the infrared camera assembly includes several infrared cameras). The camera acquisition system acquires image data, including video images and infrared video images. Construct a physiological indicator detection platform system that communicates with the camera acquisition system. The physiological indicator detection platform system includes a deep neural network for bounding box detection, a neural network processing unit for heart rate detection, a neural network processing unit for blood oxygen saturation detection, and a neural network processing unit for blood pressure detection. The deep neural network for bounding box detection is trained on sample data using bounding box segmentation and feature vector extraction. The deep neural network for bounding box detection performs heart rate bounding box segmentation and heart rate feature vector extraction on the image data and inputs it into the heart rate detection neural network processing unit. The deep neural network for bounding box detection performs blood oxygen saturation bounding box segmentation and blood oxygen saturation feature vector extraction on the image data and inputs it into the blood pressure detection neural network processing unit. The deep neural network for bounding box detection performs blood pressure bounding box segmentation and blood pressure feature vector extraction on the image data and inputs it into the blood pressure detection neural network processing unit.
[0049] In some embodiments, the deep neural network for feature box detection acquires image data to form an image pyramid (the deep neural network for feature box detection forms an image pyramid from the acquired image data), and the model for feature box segmentation and feature vector extraction is trained according to the following method:
[0050] S11. Construct a heart rate feature sample dataset. The heart rate feature sample data in the dataset includes sample image data, heart rate feature boxes, and several heart rate feature vectors. The heart rate feature box is the area that expands outward from the center of the concentrated heart rate feature vector region and covers more than 90% of the heart rate feature vectors. Figure 2 As shown, the heart rate feature vector is extracted using the following method:
[0051] S111, Extract a 12×12×3 image from the top layer of the image pyramid as input (see...). Figure 4 The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. It extracts features through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 5×5×10 feature map. After passing through two 3×3 convolutional layers, a 1×1×32 feature output is obtained, realizing the initial heart rate feature box and heart rate feature vector.
[0052] S112. Extract a 24×24×3 image from the second layer of the image pyramid as input (see...). Figure 5The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate an 11×11×28 feature map. Feature extraction is then performed again through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 4×4×48 feature map. Feature extraction is then performed again through a 2×2 convolutional layer to generate a 3×3×64 feature map. Finally, a fully connected layer flattens the feature map to obtain a heart rate feature vector of length 128, and the corrected heart rate feature box and heart rate feature vector are obtained.
[0053] S113. Extract a 48×48×3 image from the third layer of the image pyramid as input (see...). Figure 6 The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 23×23×32 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 10×10×64 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 4×4×64 feature map. A 2×2 convolutional layer generates a 3×3×128 feature map. Finally, a fully connected layer flattens the feature map to obtain a feature vector of length 128, resulting in the final heart rate feature box and heart rate feature vector.
[0054] S12. Construct a blood oxygen saturation feature sample dataset. The blood oxygen saturation feature sample data in this dataset includes sample image data, blood oxygen saturation feature boxes, and several blood oxygen saturation feature vectors. The blood oxygen saturation feature box is the region centered on the concentrated area of blood oxygen saturation feature vectors, extending outwards and covering more than 90% of the blood oxygen saturation feature vectors. The blood oxygen saturation feature boxes and blood oxygen saturation feature vectors are obtained according to steps S111 to S113. Specifically, the blood oxygen saturation feature vectors are extracted using the following method:
[0055] S121. Extract a 12×12×3 image from the top layer of the image pyramid as input. The feature box detection deep neural network uses the ReLU function as the activation function of each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 5×5×10 feature map. After passing through two 3×3 convolutional layers, a 1×1×32 feature output is obtained, realizing the initial blood oxygen saturation feature box and blood oxygen saturation feature vector.
[0056] S122. Extract a 24×24×3 image from the second layer of the image pyramid as input. The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate an 11×11×28 feature map. Then, feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 4×4×48 feature map. Then, feature extraction is performed through a 2×2 convolutional layer to generate a 3×3×64 feature map. Finally, a fully connected layer is used to flatten the feature map to obtain a heart rate feature vector of length 128, and the corrected blood oxygen saturation feature box and blood oxygen saturation feature vector are obtained.
[0057] S123. Extract a 48×48×3 image from the third layer of the image pyramid as input. The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 23×23×32 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 10×10×64 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 4×4×64 feature map. A 2×2 convolutional layer generates a 3×3×128 feature map. The feature map is then flattened through a fully connected layer to obtain a feature vector of length 128, resulting in the final blood oxygen saturation feature box and blood oxygen saturation feature vector.
[0058] S13. Construct a blood pressure feature sample dataset. The blood pressure feature sample data in the dataset includes sample image data, blood pressure feature boxes, and several blood pressure feature vectors. The blood pressure feature box is the region that expands outward from the center of the concentrated area of blood pressure feature vectors and covers more than 90% of the blood pressure feature vectors. The blood pressure feature boxes and blood pressure feature vectors are obtained according to the methods in steps S111 to S113. Specifically, the blood pressure feature vectors are extracted using the following method:
[0059] S131. Extract a 12×12×3 image from the top layer of the image pyramid as input. The feature box detection deep neural network uses the ReLU function as the activation function of each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 5×5×10 feature map. After passing through two 3×3 convolutional layers, a 1×1×32 feature output is obtained, realizing the initial blood pressure feature box and blood pressure feature vector.
[0060] S132. Extract a 24×24×3 image from the second layer of the image pyramid as input. The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate an 11×11×28 feature map. Then, feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 4×4×48 feature map. Then, feature extraction is performed through a 2×2 convolutional layer to generate a 3×3×64 feature map. Finally, a fully connected layer is used to flatten the feature map to obtain a heart rate feature vector of length 128, and the corrected blood pressure feature box and blood pressure feature vector are obtained.
[0061] S133. Extract a 48×48×3 image from the third layer of the image pyramid as input. The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 23×23×32 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 10×10×64 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 4×4×64 feature map. A 2×2 convolutional layer generates a 3×3×128 feature map. The feature map is then flattened through a fully connected layer to obtain a feature vector of length 128, resulting in the final blood pressure feature box and blood pressure feature vector.
[0062] S2, the heart rate detection neural network processing unit extracts and identifies the heart rate feature vector to simulate the pulse wave signal and obtains the time series spectrum of the human pulse wave. It performs spectrum analysis on the time series spectrum and selects the frequency corresponding to the peak frequency as the heart rate detection value A. It calculates the average value of the heart rate detection value A in the time period T1 as the heart rate detection output value.
[0063] The blood oxygen saturation detection neural network processing unit extracts and identifies the blood oxygen saturation feature vector and simulates the pulse wave signal. The pulse wave signal is then processed by separating the DC and AC components as follows:
[0064] Where I represents the amplitude at the peak of the pulse wave signal, and ΔI represents the difference between the amplitude at the peak and the amplitude at the trough of the pulse wave.
[0065] Blood oxygen saturation SpO2 was obtained using the following method:
[0066] Where A and B are calibration parameters, AC R DC represents the AC component of the visible light pulse wave signal. RAC represents the DC component of the visible light pulse wave signal. IR DC represents the AC component of the pulse wave signal under infrared light. IR This represents the DC component of the pulse wave signal under infrared light.
[0067] The blood pressure detection neural network processing unit is trained using sample data of systolic and diastolic blood pressure corresponding to the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the average amplitude and frequency of the pulse wave under infrared light. (In some embodiments, the sample data of the blood pressure detection neural network processing unit includes sample feature data including the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the corresponding systolic and diastolic blood pressures, which are measured using an Omron U724J blood pressure monitor and are consistent with the sample data.) Based on the corresponding true values of systolic and diastolic blood pressure, the blood pressure detection neural network processing unit divides the training set and validation set into an 8:2 ratio to train the network until the accuracy of the validation set reaches more than 95%, at which point the network training is complete. It then extracts and identifies the blood pressure feature vector to simulate the pulse wave signal and obtains the time-series spectrum of the human pulse wave. Spectral analysis is performed on the time-series spectrum to obtain four characteristic parameters: average amplitude of the pulse wave under visible light, average frequency of the pulse wave under visible light, average amplitude of the pulse wave under infrared light, and average frequency of the pulse wave under infrared light. Finally, it outputs blood pressure data including systolic and diastolic blood pressure.
[0068] The physiological indicator detection platform system outputs heart rate, blood oxygen saturation, and blood pressure data.
[0069] In some embodiments, such as Figure 8 As shown, the physiological indicator detection platform system of this invention internally constructs a spatiotemporal deep convolutional neural network for pulse wave signal extraction (the spatiotemporal deep convolutional neural network is a separately constructed spatiotemporal deep convolutional neural network within the physiological indicator detection platform system, which can achieve spatiotemporal recording and feature extraction). The spatiotemporal deep convolutional neural network performs the following model training on the heart rate feature vector, blood oxygen saturation feature vector, or blood pressure feature vector:
[0070] S21. Construct a sample feature dataset containing continuous frame images. The sample feature data in the sample feature dataset includes sample feature images and pulse wave signals (preferably, the pulse wave signals in the sample feature data of the sample feature dataset are the true values of photoplethysmography pulse wave signals corresponding to the sample feature images measured by a Contec CMS50E pulse oximeter; the sample feature data in the sample feature dataset is divided into training and validation sets in an 8:2 ratio to train the network until the accuracy of the validation set reaches more than 95%, at which point the network training is complete). The following association feature model training is performed on the sample feature data:
[0071] Spatiotemporal deep convolutional neural networks use the ReLU function as the activation function for each convolutional layer. Through a spatiotemporal modeling module, the first channel of each frame in the input video image data is shifted to the previous frame, and the second channel to the next frame. After this shift, the first channel will be shifted forward by one frame, and the second channel will be shifted backward by one frame. The shifted frames are truncated, and missing frames are padded with zeros, ensuring that the current frame contains channel information from both the preceding and following frames. This facilitates the extraction of spatiotemporal information from the video. (See [link to relevant documentation]). Figure 7 The system generates an N×72×72×32 feature map (N being the number of frames in the video data) through a 3×3 convolutional layer and a spatiotemporal modeling module. This generates an N×70×70×32 feature map. A 2×2 average pooling layer and a Dropout layer with a 0.25 dropout ratio generate an N×35×35×32 feature map. Another spatiotemporal modeling module and a 3×3 convolutional layer generate an N×35×35×64 feature map. A third spatiotemporal modeling module and a 3×3 convolutional layer generate an N×33×33×64 feature map. A 2×2 average pooling layer and a Dropout layer with a 0.5 dropout ratio generate an N×16×16×64 feature map. Finally, a fully connected layer flattens the feature maps to obtain a 128-bit feature vector, which is then output as a pulse wave signal.
[0072] The spatiotemporal modeling module of the spatiotemporal deep convolutional neural network constructs a spatiotemporal sequence of consecutive frame images and finally outputs the pulse wave signal in the time series.
[0073] In some embodiments, the time series spectrum is obtained as follows:
[0074] The pulse wave signal was filtered using a Butterworth bandpass filter with cutoff frequencies of 0.6 Hz and 3.0 Hz, preserving waveforms within the frequency range of [0.6 Hz, 3.0 Hz]. A Fast Fourier Transform was then performed on the filtered pulse wave signal to obtain the time-series spectrum of the human pulse wave.
[0075] like Figure 3 As shown, a deep learning-based non-contact physiological indicator detection system includes a camera acquisition system and a physiological indicator detection platform system that is communicatively connected to the camera acquisition system. The camera acquisition system includes a visible light camera and an infrared camera component. The physiological indicator detection platform system includes a feature box detection deep neural network, a heart rate detection neural network processing unit, a blood oxygen saturation detection neural network processing unit, and a blood pressure detection neural network processing unit.
[0076] The deep neural network for bounding box detection is trained on sample data after bounding box segmentation and feature vector extraction. It then performs heart rate bounding box segmentation and extracts heart rate feature vectors from the image data, inputting these into the heart rate detection neural network processing unit. Finally, it performs blood oxygen saturation bounding box segmentation and extracts blood pressure feature vectors from the image data, inputting these into the blood pressure detection neural network processing unit.
[0077] The heart rate detection neural network processing unit is used to extract and identify the heart rate feature vector, simulate the pulse wave signal, and obtain the time series spectrum of the human pulse wave. It performs spectral analysis on the time series spectrum and selects the frequency corresponding to the peak frequency as the heart rate detection value A. It calculates the average value of the heart rate detection value A in the time period T1 as the heart rate detection output value.
[0078] The blood oxygen saturation detection neural network processing unit extracts and identifies the blood oxygen saturation feature vector and simulates the pulse wave signal. The pulse wave signal is then processed by separating the DC and AC components, and the blood oxygen saturation is calculated.
[0079] The blood pressure detection neural network processing unit is trained using sample data of systolic and diastolic blood pressure corresponding to the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the average frequency of the pulse wave under infrared light. It extracts and identifies blood pressure feature vectors to simulate pulse wave signals and obtains the time series spectrum of human pulse waves. It performs spectral analysis on the time series spectrum to obtain four feature parameters: average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and outputs blood pressure data including systolic and diastolic blood pressure.
[0080] The physiological indicator detection platform system is also used to output heart rate, blood oxygen saturation, and blood pressure data.
[0081] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A non-contact physiological indicator detection method based on deep learning, characterized in that: The methods include: S1. Construct a camera acquisition system, including visible light and infrared camera components, to acquire image data including video images and infrared video images. Construct a physiological indicator detection platform system that communicates with the camera acquisition system. This platform system includes a deep neural network for bounding box detection, a neural network processing unit for heart rate detection, a neural network processing unit for blood oxygen saturation detection, and a neural network processing unit for blood pressure detection. The deep neural network for bounding box detection is trained on sample data using bounding box segmentation and feature vector extraction. The deep neural network for bounding box detection performs heart rate bounding box segmentation and heart rate feature vector extraction on the image data and inputs it to the heart rate detection neural network processing unit. The deep neural network for bounding box detection performs blood oxygen saturation bounding box segmentation and blood oxygen saturation feature vector extraction on the image data and inputs it to the blood oxygen saturation detection neural network processing unit. The deep neural network for bounding box detection performs blood pressure bounding box segmentation and blood pressure feature vector extraction on the image data and inputs it to the blood pressure detection neural network processing unit. S2. The heart rate detection neural network processing unit extracts and identifies heart rate feature vectors to simulate pulse wave signals and obtains the time series spectrum of human pulse waves. The physiological index detection platform system internally constructs a spatiotemporal deep convolutional neural network for pulse wave signal extraction. The spatiotemporal deep convolutional neural network trains the heart rate feature vector, blood oxygen saturation feature vector, or blood pressure feature vector as follows: S21. Construct a sample feature dataset containing consecutive frame images. The sample feature data in the dataset includes sample feature images and pulse wave signals. Train the following correlation feature model on the sample feature data: The spatiotemporal deep convolutional neural network uses the ReLU function as the activation function for each convolutional layer. Through a spatiotemporal modeling module and a 3×3 convolutional layer, it generates an N×72×72×32 feature map. The spatiotemporal modeling module shifts the first channel of all frames in the input video image data to the previous frame and the second channel to the next frame. After this shift, the first channel is shifted forward by one frame, and the second channel is shifted backward by one frame. The shifted frames are truncated, and missing frames are padded with zeros, ensuring that the current frame contains channel information from both the previous and next frames, facilitating the extraction of spatiotemporal information from the video. Then, through another spatiotemporal modeling module and a 3×3 convolutional layer, it generates an N×70×70×32 feature map. The feature map is processed by passing it through a 2×2 average pooling layer and a Dropout layer with a dropout ratio of 0.25 to generate an N×35×35×32 feature map. Then, it is passed through a spatiotemporal modeling module and a convolutional layer with a kernel of 3×3 to generate an N×35×35×64 feature map. Then, it is passed through a spatiotemporal modeling module and a convolutional layer with a kernel of 3×3 to generate an N×33×33×64 feature map. Then, it is passed through a 2×2 average pooling layer and a Dropout layer with a dropout ratio of 0.5 to generate an N×16×16×64 feature map. Finally, the feature map is flattened through a fully connected layer to obtain a feature vector of length 128. Finally, the pulse wave signal is output. The spatiotemporal modeling module of the spatiotemporal deep convolutional neural network constructs a spatiotemporal sequence of continuous frame images and finally outputs the pulse wave signal under the time series; the spectrum of the time series is analyzed and the frequency corresponding to the peak of the spectrum is selected as the heart rate detection value A, and the average value of the heart rate detection value A in the time period T1 is calculated as the heart rate detection output value; The blood oxygen saturation detection neural network processing unit extracts and identifies the blood oxygen saturation feature vector and simulates the pulse wave signal. The pulse wave signal is then processed by separating the DC and AC components as follows: ; wherein denotes the amplitude at the peak of the pulse wave signal, denotes the difference between the amplitude at the peak and the amplitude at the trough of the pulse wave signal; Blood oxygen saturation was obtained using the following method. : ;in and For calibration parameters, This represents the AC component of the pulse wave signal under visible light. This represents the DC component of the visible light pulse wave signal. This represents the AC component of the pulse wave signal under infrared light. This represents the DC component of the pulse wave signal under infrared light. The blood pressure detection neural network processing unit is trained using sample data of systolic and diastolic blood pressure corresponding to the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the average amplitude and frequency of the pulse wave under infrared light. It extracts and identifies blood pressure feature vectors to simulate pulse wave signals and obtains the time series spectrum of human pulse waves. It performs spectral analysis on the time series spectrum to obtain four feature parameters: average amplitude and frequency of the pulse wave under visible light, average amplitude and frequency of the pulse wave under infrared light, and outputs blood pressure data including systolic and diastolic blood pressure. The physiological indicator detection platform system outputs heart rate, blood oxygen saturation, and blood pressure data.
2. The non-contact physiological indicator detection method based on deep learning according to claim 1, characterized in that: In step S1, the deep neural network for feature box detection acquires image data to form an image pyramid, and the model for feature box segmentation and feature vector extraction is trained according to the following method: S11. Construct a heart rate feature sample dataset. The heart rate feature sample data in the dataset includes sample image data, heart rate feature boxes, and several heart rate feature vectors. The heart rate feature box is the region that expands outward from the center of the heart rate feature vector set and covers more than 90% of the heart rate feature vectors. The heart rate feature vectors are extracted using the following method: S111. Extract a 12×12×3 image from the top layer of the image pyramid as input. The feature box detection deep neural network uses the ReLU function as the activation function of each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 5×5×10 feature map. After passing through two 3×3 convolutional layers, a 1×1×32 feature output is obtained, realizing the initial heart rate feature box and heart rate feature vector. S112. Extract a 24×24×3 image from the second layer of the image pyramid as input. The feature box detection deep neural network uses the ReLU function as the activation function of each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate an 11×11×28 feature map. Then, feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 4×4×48 feature map. Then, feature extraction is performed through a 2×2 convolutional layer to generate a 3×3×64 feature map. Finally, the feature map is flattened through a fully connected layer to obtain a heart rate feature vector of length 128, and the corrected heart rate feature box and heart rate feature vector are obtained. S113. Extract a 48×48×3 image from the third layer of the image pyramid as input. The deep neural network for feature box detection uses the ReLU function as the activation function for each convolutional layer. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 23×23×32 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 3×3 max pooling layer to generate a 10×10×64 feature map. Feature extraction is performed through a 3×3 convolutional layer and a 2×2 max pooling layer to generate a 4×4×64 feature map. A 2×2 convolutional layer generates a 3×3×128 feature map. The feature map is then flattened through a fully connected layer to obtain a feature vector of length 128, resulting in the final heart rate feature box and heart rate feature vector. S12. Construct a blood oxygen saturation feature sample dataset. The blood oxygen saturation feature sample data in the blood oxygen saturation feature sample dataset includes sample image data, blood oxygen saturation feature boxes, and several blood oxygen saturation feature vectors. The blood oxygen saturation feature box is the area that expands outward from the center of the concentrated area of blood oxygen saturation feature vectors and covers more than 90% of the blood oxygen saturation feature vectors. The blood oxygen saturation feature boxes and blood oxygen saturation feature vectors are obtained according to the methods in steps S111 to S113. Construct a blood pressure feature sample dataset. The blood pressure feature sample data in the blood pressure feature sample dataset includes sample image data, blood pressure feature boxes, and several blood pressure feature vectors. The blood pressure feature box is the area that expands outward from the center of the concentrated area of blood pressure feature vectors and covers more than 90% of the blood pressure feature vectors. The blood pressure feature boxes and blood pressure feature vectors are obtained according to the methods in steps S111 to S113.
3. The non-contact physiological indicator detection method based on deep learning according to claim 1, characterized in that: The pulse wave signal in the sample feature dataset is the true value of the photoplethysmography pulse wave signal corresponding to the sample feature image, measured by a Kangtai CMS50E pulse oximeter.
4. The non-contact physiological indicator detection method based on deep learning according to claim 1, characterized in that: In step S2, the time series spectrum is obtained as follows: The pulse wave signal was filtered using a Butterworth bandpass filter with cutoff frequencies of 0.6 Hz and 3.0 Hz, preserving waveforms within the frequency range of [0.6 Hz, 3.0 Hz]. The filtered pulse wave signal was then subjected to a Fast Fourier Transform to obtain the time-series spectrum of the human pulse wave.
5. The non-contact physiological indicator detection method based on deep learning according to claim 1, characterized in that: The sample data of the blood pressure detection neural network processing unit includes sample feature data such as the average amplitude of the pulse wave under visible light, the average frequency of the pulse wave under visible light, the average amplitude of the pulse wave under infrared light, and the average frequency of the pulse wave under infrared light, as well as the corresponding systolic and diastolic blood pressures. The systolic and diastolic blood pressures are the true values of systolic and diastolic blood pressures corresponding to the sample data, measured using an Omron U724J blood pressure monitor.
6. A deep learning-based non-contact physiological indicator detection system for implementing the method of claim 1, characterized in that: The system includes a camera acquisition system and a physiological indicator detection platform system that communicates with the camera acquisition system. The camera acquisition system includes a visible light camera and an infrared camera component. The physiological indicator detection platform system includes a feature box detection deep neural network, a heart rate detection neural network processing unit, a blood oxygen saturation detection neural network processing unit, and a blood pressure detection neural network processing unit. The deep neural network for bounding box detection is trained on sample data after bounding box segmentation and feature vector extraction. The deep neural network then performs heart rate bounding box segmentation and heart rate feature vector extraction on the image data and inputs the results into the heart rate detection neural network processing unit. Similarly, the deep neural network performs blood oxygen saturation bounding box segmentation and blood oxygen saturation feature vector extraction on the image data and inputs the results into the blood oxygen saturation detection neural network processing unit. Finally, the deep neural network performs blood pressure bounding box segmentation and blood pressure feature vector extraction on the image data and inputs the results into the blood pressure detection neural network processing unit. The heart rate detection neural network processing unit is used to extract and identify the heart rate feature vector, simulate the pulse wave signal, and obtain the time series spectrum of the human pulse wave. It performs spectral analysis on the time series spectrum and selects the frequency corresponding to the peak of the spectrum as the heart rate detection value A. It calculates the average value of the heart rate detection value A in the time period T1 as the heart rate detection output value. The blood oxygen saturation detection neural network processing unit extracts and identifies the blood oxygen saturation feature vector and simulates the pulse wave signal. The pulse wave signal is then processed by separating the DC and AC components, and the blood oxygen saturation is calculated. The blood pressure detection neural network processing unit is trained using sample data of systolic and diastolic blood pressure corresponding to the average amplitude and frequency of the pulse wave under visible light, the average amplitude and frequency of the pulse wave under infrared light, and the average amplitude and frequency of the pulse wave under infrared light. It extracts and identifies blood pressure feature vectors to simulate pulse wave signals and obtains the time series spectrum of human pulse waves. It performs spectral analysis on the time series spectrum to obtain four feature parameters: average amplitude and frequency of the pulse wave under visible light, average amplitude and frequency of the pulse wave under infrared light, and outputs blood pressure data including systolic and diastolic blood pressure. The physiological indicator detection platform system is also used to output heart rate, blood oxygen saturation, and blood pressure data.