An electrocardiosignal denoising method based on multi-scale deep learning
By combining multi-scale deep learning and morphological perception composite loss function, the problem of lack of collaborative integration of multi-head attention mechanism in ECG signal denoising is solved, achieving more efficient noise suppression and signal fidelity, and improving the denoising effect of ECG signal.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN ECGMAC MEDICAL ELECTRONICS
- Filing Date
- 2026-02-06
- Publication Date
- 2026-06-26
AI Technical Summary
Existing multi-head attention mechanisms lack dynamic adaptability and collaborative integration mechanisms in ECG signal denoising, resulting in insufficient information attention or waste of resources, which affects the model's performance in tasks such as fine context understanding and long-range dependency modeling.
A multi-scale deep learning approach is adopted to construct a multi-branch one-dimensional convolutional reconstruction network. Combining a morphological perception composite loss function and a signal-to-noise ratio adaptive gating mechanism, the ECG signal denoising process is optimized through multi-scale feature extraction and information fusion.
It significantly improves the noise suppression effect of ECG signals, reduces the risk of temporal artifacts, improves signal fidelity and the model's generalization ability in complex noise environments, and enhances the morphological fidelity and robustness of the denoised signal.
Smart Images

Figure CN122272040A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of electrocardiogram (ECG) signal denoising technology, specifically to an ECG signal denoising method based on multi-scale deep learning. Background Technology
[0002] Multi-head attention mechanisms, as a core component of modern deep learning models, significantly improve the model's representational ability in sequence modeling by computing multiple attention subspaces in parallel. Existing technologies typically employ a fixed number of attention heads, each responsible for learning different aspects of feature interaction patterns from the same input, and are widely used in many benchmark tasks.
[0003] In existing technologies, fixed-configuration multi-head attention mechanisms lack dynamic adaptability and cannot optimize the allocation of attention resources according to specific input samples during inference. This may lead to insufficient attention to key information or waste of computational resources. Secondly, when each attention head extracts features independently, there is a lack of effective collaboration and information integration mechanisms, which can easily lead to redundancy or conflict in the feature patterns learned by different heads, rather than complementarity. This affects the efficiency and effectiveness of overall feature representation. These problems together limit the performance ceiling of the model in tasks that require fine context understanding and long-range dependency modeling. Based on this, this invention designs an ECG signal denoising method based on multi-scale deep learning to solve the above problems. Summary of the Invention
[0004] The purpose of this invention is to provide a method for denoising electrocardiogram signals based on multi-scale deep learning, which solves the problem of lack of collaborative and information integration mechanisms in the background technology.
[0005] To solve the above-mentioned technical problems, the present invention provides the following technical solution: A method for denoising electrocardiogram (ECG) signals based on multi-scale deep learning includes a training phase and an inference phase. The training phase includes the following steps: Step S101: Obtain near-noise-free ECG data as a reference signal and construct a dataset based on it; the dataset includes a training set, a validation set and a test set. The construction process includes adding noise to the above near-noise-free ECG data to generate noisy ECG data as model input, while the original near-noise-free ECG data is used as the label required for model training. Step S102 involves preprocessing the dataset, including unifying the sampling rate to the target sampling rate Fs, slicing the data with a window length of W and a stride of H, and normalizing the mean MU and standard deviation STD calculated based on the training set label data. The normalization formula is as follows: ; Where X i For a single raw data point in a single segment, XNi For the normalized data, MU and STD are the mean and standard deviation calculated using the labeled data fragments of the training set, respectively, and W is the length of the window; Step S103: Construct a multi-scale one-dimensional convolutional reconstruction network; Step S104: Define the composite loss function for shape perception; Step S105: Train the network using the training set and calculate the comprehensive score (Score) on the validation set. The calculation formula is as follows: ; Among them, s i To ensure that the scores of each indicator are unified in direction and normalized, w i For the corresponding weights; Step S106: Implement an early stopping strategy based on the comprehensive score. Stop training when there is no improvement for P consecutive rounds, and save the optimal model weights. P is a preset threshold. Step S107: Evaluate model performance using a test set.
[0006] Preferably, in step S103, reconstructing the network includes: The input layer uses one-dimensional convolution for preliminary feature extraction, with little or no normalization used. The multi-branch backbone includes local convolution branches and dilated convolution branches with different dilation rates to cover different time scales; The encoding and decoding structure fuses shallow fine-grained features with deep semantic features through skip connections; The output layer uses one-dimensional convolution to regress to the time-domain waveform, without normalization or using only linear activation.
[0007] Preferably, in step S104, the expression for the composite loss function is: ; Among them, L time For time-domain reconstruction error, L stft For multi-scale short-time Fourier transform consistency loss, L diff With $L dd L represents the first-order and second-order difference consistency losses, respectively. band With L mid These represent the frequency band residual energy constraint and the intermediate frequency band protection loss, respectively. low With L ultra These are the same parameters for low frequency and ultra-low frequency, respectively, L trend L is the trend consistency parameter. peak For the adaptive weight loss constructed based on the heartbeat peak-valley neighborhood, L cons For the gated consistency loss based on the input signal-to-noise ratio, L meanFor mean shift constraint, w t w s w d w b w m w tr w l w u w p w c w mean with w dd These are the various parameters.
[0008] Preferably, the reasoning stage includes the following steps: Step S201: Perform the same preprocessing on the ECG signal to be processed as in the training phase; Step S202: Construct the input fragment, fill it with symmetrical reflection and slice it according to the window length W and step size W / 2; Step S203: Load the optimal model for inference to obtain the denoised segment; Step S204: Window, overlap, and sum the output segments, then perform inverse normalization to obtain the final denoised signal. The inverse normalization formula is: ; Among them, y c For the reconstruction results, MU and STD are the mean and standard deviation of the training set label fragments saved during model training, respectively.
[0009] Preferably, in the multi-branch backbone, the hole convolution dilation rate is configurable to cover wide QRS waves and long-term baseline drift structures.
[0010] Preferably, in the composite loss function, L stft The difference in short-time Fourier transform is calculated using multiple sets of window length and step size parameters; L band Calculate residual power deviation; bandwidth boundaries are configurable; L cons The gating functions used are Sigmoid and piecewise linear functions.
[0011] Preferably, the calculation of the comprehensive score includes the following steps: Unify the direction of all indicators; Scale normalization is performed on the validation set, and the normalization formula is: ; Where m is the current indicator value. i and σ i This represents the mean and standard deviation of the indicator's historical distribution on the validation set. The weighted sum is used to obtain the overall score.
[0012] Preferably, the windowing overlap addition process includes the following steps: Multiply the output segment by a Hanning window or other window function; Perform overlapping addition and divide by the weighted sum; Remove the beginning and end padding and align it with the original timeline.
[0013] Preferably, the construction of the dataset includes adding at least one of the following types of noise to a clean electrocardiogram signal: transient noise, continuous noise, a combination of transient and continuous noise, or no noise.
[0014] Preferably, the method supports multi-lead ECG signal processing, and in the inference stage, cross-lead weighted fusion can be performed based on the consistency between lead quality scores and cross-lead cross-correlation.
[0015] Compared with the prior art, the beneficial effects achieved by the present invention are: 1. In this invention, to address the problem that traditional frequency domain filtering methods struggle to balance signal fidelity and noise suppression due to the overlap between noise and ECG signal frequency bands, a multi-scale feature extraction network is constructed and combined with multi-scale time-frequency consistency constraints. This effectively suppresses various complex interferences such as baseline drift, power line interference, and electromyographic noise. While suppressing noise, the risk of introducing time-domain artifacts is significantly reduced, achieving a better balance between noise suppression and fidelity. Furthermore, by constructing a "noisy-clean" paired supervised learning dataset, the network can learn effective mappings from a large number of noise patterns, further enhancing the model's generalization ability in complex noise environments.
[0016] 2. In this invention, in order to address the shortcomings of existing deep learning methods that may mistakenly weaken key ECG morphologies due to the single design of loss functions, a morphology-aware composite loss function is designed. By introducing explicit morphology protection mechanisms such as derivative / curvature constraints, frequency band energy protection, and peak sensitivity weighting, the misjudgment and weakening of key clinical waveform features are effectively reduced, and the morphological fidelity of the denoised signal is significantly improved.
[0017] 3. In this invention, to address the problem that the model may over-clean all segments, leading to distortion of high signal-to-noise ratio segments, an adaptive signal-to-noise ratio gating mechanism is introduced. This mechanism enables the model to dynamically adjust the denoising intensity based on the instantaneous signal-to-noise ratio of the input segments. In high signal-to-noise ratio segments, the principle of minimal modification is followed to preserve the original reliable information, while in low signal-to-noise ratio segments, the focus is on repair, thereby improving the overall robustness of the processing and the reliability of the results. Attached Figure Description
[0018] Figure 1 This is an overall flowchart of the electrocardiogram signal denoising method of the present invention; Figure 2 This is a diagram of the multi-scale one-dimensional convolutional reconstruction network architecture of the present invention; Figure 3This is a flowchart of the training phase of the present invention. Detailed Implementation
[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] Example 1: Please refer to Figures 1-3 A method for denoising electrocardiogram (ECG) signals based on multi-scale deep learning includes a training phase and an inference phase. The training phase includes the following steps: Step S101: Obtain near-noise-free ECG data as a reference signal and construct a dataset based on it; the dataset includes a training set, a validation set and a test set. The construction process includes adding noise to the above near-noise-free ECG data to generate noisy ECG data as model input, while the original near-noise-free ECG data is used as the label required for model training. Step S102 involves preprocessing the dataset, including unifying the sampling rate to the target sampling rate Fs, slicing the data with a window length of W and a stride of H, and normalizing the mean MU and standard deviation STD calculated based on the training set label data. The normalization formula is as follows: ; Where X i For a single raw data point in a single segment, XN i For the normalized data, MU and STD are the mean and standard deviation calculated using the labeled data fragments of the training set, respectively, and W is the length of the window; Step S103: Construct a multi-scale one-dimensional convolutional reconstruction network; Step S104: Define the composite loss function for shape perception; Step S105: Train the network using the training set and calculate the comprehensive score (Score) on the validation set. The calculation formula is as follows: ; Among them, s i To ensure that the scores of each indicator are unified in direction and normalized, w i For the corresponding weights; Step S106: Implement an early stopping strategy based on the comprehensive score. Stop training when there is no improvement for P consecutive rounds, and save the optimal model weights. P is a preset threshold. Step S107: Evaluate model performance using a test set.
[0021] In step S103, network reconstruction includes: The input layer uses one-dimensional convolution for preliminary feature extraction, with little or no normalization used. The multi-branch backbone includes local convolution branches and dilated convolution branches with different dilation rates to cover different time scales; The encoding and decoding structure fuses shallow fine-grained features with deep semantic features through skip connections; The output layer uses one-dimensional convolution to regress to the time-domain waveform, without normalization or using only linear activation.
[0022] In step S104, the expression for the composite loss function is: ; Among them, L time For time-domain reconstruction error, L stft For multi-scale short-time Fourier transform consistency loss, L diff With $L dd L represents the first-order and second-order difference consistency losses, respectively. band With L mid These represent the frequency band residual energy constraint and the intermediate frequency band protection loss, respectively. low With L ultra These are the same parameters for low frequency and ultra-low frequency, respectively, L trend L is the trend consistency parameter. peak For the adaptive weight loss constructed based on the heartbeat peak-valley neighborhood, L cons For the gated consistency loss based on the input signal-to-noise ratio, L mean For mean shift constraint, w t w s w d w b w m w tr w l w u w p w c w mean with w dd These are the various parameters.
[0023] The working principle of this invention is as follows: This invention constructs an end-to-end training framework integrating multi-scale feature extraction, morphological perception composite loss, and comprehensive score monitoring to learn the accurate mapping from noisy ECG signals to high-quality reference signals in a data-driven manner. The process begins with the data preparation stage, acquiring "clean" ECG data covering normal heart rhythms and various pathological types as reference signals (i.e., the "labels" in supervised learning). Noisy ECG signals are generated by programmatically adding simulated noise (such as EMG artifacts, baseline drift, power line interference, and combinations thereof) as the "input" to the model, thus constructing an "input-label" paired dataset for supervised learning. The preprocessing stage standardizes the data, including unifying the sampling rate to the target frequency and slicing it according to a fixed window length and step size. Finally, it normalizes the data using the global mean (MU) and standard deviation (STD) calculated from the training set label data, providing stable input to the network.
[0024] The core of the training is to construct and optimize a specially designed multi-scale one-dimensional convolutional reconstruction network. This network architecture is designed to collaboratively process the diverse temporal features in ECG signals: its input layer uses conventional convolutions for initial feature extraction and intentionally reduces or eliminates normalization to preserve the absolute amplitude and low-frequency trends of the original signal; the backbone adopts a multi-branch structure with local convolutions and dilated convolutions with different dilation rates in parallel, enabling the network to simultaneously capture sharp transient features such as narrow QRS waves as well as long-term structures such as wide QRS waves or slow baseline drift; the encoder-decoder structure uses a skip connection mechanism to fuse shallow fine-grained features with deep semantic features, effectively reducing information loss during reconstruction and suppressing over-smoothing.
[0025] To guide the network in explicitly protecting key clinical morphologies while suppressing noise, this invention defines a morphology-aware composite loss function as the optimization objective. This loss function is not a single index, but a weighted summation system, with sub-terms including: 1) temporal reconstruction loss (e.g., Charbonnier loss) to ensure overall waveform fidelity; 2) multi-scale short-time Fourier transform (STFT) consistency loss to constrain the output to be consistent with the reference signal at multiple time-frequency resolutions, thus suppressing frequency domain artifacts; 3) first-order and second-order differential consistency loss to specifically protect the sharp zero-crossing characteristics of the QRS wave and the geometry of the fine P wave; and 4) frequency-band residual energy constraint to specifically suppress ultra-low frequency baseline drift and power frequency neighborhood interference, and explicitly protect the main energy band of the QRS wave (e.g., 2-60Hz) and atrial fibrillation. 5) Adaptive peak-weighted loss based on the peak-valley neighborhood of the heartbeat, which improves the fidelity of transient features such as R-peaks and narrow QRS waves and prevents them from being misjudged as noise; 6) Signal-to-noise ratio (SNR) adaptive gated consistency loss, which dynamically adjusts the penalty intensity according to the estimated SNR of the input segment, so that the model follows the adaptive principle of "less modification to high SNR segments and key repair of low SNR segments", which improves the overall stability and avoids over-cleaning; In addition, it also includes multiple losses such as trend consistency and mean drift constraint, which together constitute an optimization goal that comprehensively constrains signal fidelity, frequency domain characteristics and morphological structure.
[0026] The monitoring and model selection during training do not directly rely on the loss function value, but instead introduce a comprehensive score calculated on an independent validation set. This score is obtained by weighted summation of a set of evaluation metrics that have undergone direction unification and scale normalization. These metrics include mean absolute error (MAE), signal-to-noise ratio improvement (ΔSNR), multi-scale STFT difference (STFTms), derivative L1 distance (DiffL1), and residual power deviation in key frequency bands. By comprehensively quantifying these metrics that reflect multi-dimensional performance such as time-domain fidelity, frequency-domain consistency, and morphological preservation, the score provides a more balanced benchmark for model performance than a single loss. This score is continuously monitored during training, and early stopping is triggered when it fails to improve within a preset number of consecutive epochs. The model weights corresponding to the historical best score are always retained as the final derived model. Finally, the performance of the optimal model is evaluated on an independent test set, completing the validation and confirmation of the entire training phase.
[0027] In addition, the loss function can also include multiple losses such as trend consistency and mean drift constraint, which together constitute an optimization objective that comprehensively constrains signal fidelity, frequency domain characteristics and morphological structure. By jointly optimizing the above multi-objective losses and using the early stop strategy to monitor the comprehensive score on the validation set, the model can effectively balance noise suppression and morphological preservation, and achieve high-quality reconstruction of complex electrocardiogram signals.
[0028] Example 2: Please refer to Figures 1-3 In this embodiment of the invention, the reasoning stage includes the following steps: Step S201: Perform the same preprocessing on the ECG signal to be processed as in the training phase; Step S202: Construct the input fragment, fill it with symmetrical reflection and slice it according to the window length W and step size W / 2; Step S203: Load the optimal model for inference to obtain the denoised segment; Step S204: Window, overlap, and sum the output segments, then perform inverse normalization to obtain the final denoised signal. The inverse normalization formula is: ; Among them, y c For the reconstruction results, MU and STD are the mean and standard deviation of the training set label fragments saved during model training, respectively.
[0029] In a multi-branch backbone, the hole convolutional dilation rate is configurable to cover wide QRS waves and long-term time-history structures with baseline drift.
[0030] In the composite loss function, L stft The difference in short-time Fourier transform is calculated using multiple sets of window length and step size parameters; L band Calculate residual power deviation; bandwidth boundaries are configurable; L cons The gating functions used are Sigmoid and piecewise linear functions.
[0031] The calculation of the overall score involves the following steps: Unify the direction of all indicators; Scale normalization is performed on the validation set, and the normalization formula is: ; Where m is the current indicator value. i and σ i This represents the mean and standard deviation of the indicator's historical distribution on the validation set. The weighted sum is used to obtain the overall score.
[0032] The working principle of this invention is as follows: The inference process begins with preprocessing the ECG signal to be processed in a manner strictly consistent with that of the training phase. First, the signal sampling rate is converted to the target sampling rate Fs set during model training to ensure time-frequency characteristics match. Subsequently, the global mean (MU) and standard deviation (STD) calculated from the training set "label" data and saved during the training phase are applied to perform the same normalization process on the signal, so that the distribution of the input data is aligned with the distribution seen during model training.
[0033] To handle long sequences and reduce boundary effects, the normalized signal is first padded with symmetrical reflections at both ends, typically with a padded length of half the window length W. Then, an overlapping sliding window approach is used to segment the signal into a series of segments, usually with a 50% overlap (i.e., a step size of W / 2), to ensure that each point in the signal is covered by multiple windows, creating conditions for subsequent weighted fusion. Each segment of length W serves as one input to the model.
[0034] Each input segment is independently fed into the model for forward inference. The network extracts features and reconstructs the data layer by layer using its multi-scale convolutional structure and learned parameters, outputting a denoised segment of the corresponding length. In this process, the model implicitly utilizes all the prior knowledge it learned during training through the composite loss function, including noise suppression, frequency domain consistency maintenance, and key morphological protection.
[0035] After obtaining all output segments, a crucial overlay-windowing-stitching post-processing step is performed to reconstruct the complete signal. First, a window function (such as a Hanning window) corresponding to the input slice is applied to each output segment to smooth the segment boundaries. Then, all windowed segments are overlapped and added together according to their time positions (Overlap-Add, OLA). In the overlap region, multiple predictions from adjacent windows are weighted and fused. The basic fusion weight is the window function itself (or its square) to ensure amplitude consistency and eliminate modulation effects introduced by windowing. This invention can also introduce more advanced adaptive weighting mechanisms, such as calculating confidence weights based on the instantaneous SNR estimation or local peak density of the segments, assigning greater weight to high-confidence predictions in the overlap region, thereby adaptively strengthening reliable results and suppressing unreliable predictions, further improving reconstruction quality. After completing the overlap-add, the beginning and end padding is removed, resulting in a complete normalized denoised signal sequence perfectly aligned with the original input time axis.
[0036] Finally, the reconstructed result is denormalized to restore the original physical amplitude unit of the signal. If the original input sampling rate differs from the model training sampling rate, corresponding resampling is required to restore the original sampling rate. The final output is the denoised ECG signal, which, while suppressing various interferences such as baseline drift, power line interference, and electromyographic noise, retains the waveform morphology characteristics relied upon for clinical diagnosis to the greatest extent possible.
[0037] Example 3: Please refer to Figures 1-3 In this embodiment of the invention, the windowing overlap addition process includes the following steps: multiplying the output segment by a Hanning window or other window function; performing overlap addition and dividing by a weighted sum; removing the beginning and end padding and aligning with the original time axis.
[0038] The construction of the dataset involves adding at least one of the following types of noise to clean electrocardiogram signals: transient noise, continuous noise, a combination of transient and continuous noise, or no noise.
[0039] The method supports multi-lead ECG signal processing, and in the inference stage, cross-lead weighted fusion can be performed based on the consistency between lead quality scores and cross-lead cross-correlation.
[0040] The working principle of this invention is as follows: Instead of directly using noisy clinical data, this invention employs a programmed approach to inject controllable synthetic noise into known "clean" reference ECG signals. This includes: 1) transient noise, used to simulate short-term sudden disturbances such as electromyographic artifacts; 2) persistent noise, used to simulate long-term steady-state or slowly varying disturbances such as baseline drift and power line interference; 3) combinations of the above two types of noise to simulate more complex mixed noise scenarios; and 4) in addition, the dataset also contains some original "clean" samples without added noise. This supervised synthesis method can generate a large number and diverse range of "noise-clean" signal pairs, ensuring the scale and quality of the training data.
[0041] Secondly, regarding the detailed working principle of the windowed overlap-addition (OLA) post-processing in the inference stage, this mechanism is a key technology to ensure the temporal continuity and amplitude consistency of long-sequence signals after denoising. The specific steps are as follows: 1) Windowing: Multiply each denoised segment of the model output by a smooth window function (such as a Hanning window). The window function gradually approaches zero at both ends of the segment to smooth the start and end boundaries of each segment, avoiding discontinuities or artifacts that may be introduced due to higher prediction uncertainty at segment boundaries. 2) Overlap-addition: Arrange and superimpose all windowed segments according to their original temporal order. Due to the use of 50% overlap slices, every time point in the signal, except for the two extreme ends, is exactly covered by two consecutive windows. Therefore, in the superimposed sequence, each point within the overlap region is a weighted sum of prediction values from two different windows. 3) Weight normalization: To compensate for the influence of the window function on the signal amplitude, amplitude correction is needed for the superimposed signal. Divide each point in the superimposed sequence by the sum of all window function values covering it at that time position (i.e., cumulative window weights). This operation precisely eliminates the amplitude modulation introduced by windowing, thereby reconstructing a continuous time-domain signal with accurate amplitude and smooth boundaries. After removing the previously added symmetrical reflection fill, the time axis of the final reconstructed signal is precisely aligned with that of the original unfilled signal.
[0042] To further improve consistency across multiple leads and leverage complementary information to enhance denoising, this invention introduces an optional interlead weighted fusion mechanism during the inference phase. This mechanism is applied in the overlapping addition step: for the same time point, multiple predicted values from different leads (within their respective overlap windows) are not only weighted by a window function, but also subject to secondary weighting based on lead quality. During fusion, higher-quality leads or leads with high consistency with other leads are assigned greater weight. This adaptive weighted fusion can correct for severe interference in a single lead by utilizing information from other higher-quality leads, thereby achieving information complementarity at the lead level and enhancing the reliability of the overall ECG recording system's denoising results.
[0043] Working Principle: Based on an end-to-end deep learning framework, it comprises two phases: training and inference. During training, a dataset of "noisy-clean" ECG signal pairs containing various noise types is first constructed and preprocessed using standardization. The core is the construction of a multi-scale one-dimensional convolutional reconstruction network, which simultaneously extracts sharp transients and slowly changing features from ECG signals through parallel local convolution and dilated convolution branches. Network training is guided by a morphology-aware composite loss function that integrates multiple constraints, including temporal fidelity, multi-scale time-frequency consistency, derivative constraints, frequency band energy control, and adaptive signal-to-noise ratio gating. Model optimization does not rely on a single loss function but achieves early arrest and optimal weight selection by calculating a comprehensive score that integrates multiple dimensions on the validation set.
[0044] During the inference phase, the signal to be processed undergoes the same preprocessing and is then input into the trained model via a sliding window. The model's output segments are seamlessly stitched together through windowing and overlapping operations, and the fusion effect can be further optimized using a confidence-based weighting mechanism. Finally, a high-fidelity, continuous denoised ECG signal is output after denormalization. This method adaptively achieves an effective balance between complex noise suppression and key waveform morphology protection through a data-driven approach.
[0045] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A method for denoising electrocardiogram (ECG) signals based on multi-scale deep learning, characterized in that, It includes a training phase and an inference phase. The training phase includes the following steps: Step S101: Obtain near-noise-free ECG data as a reference signal and construct a dataset based on it; the dataset includes a training set, a validation set, and a test set. The construction process includes adding noise to the above near-noise-free ECG data to generate noisy ECG data as model input, while the original near-noise-free ECG data serves as the labels required for model training. Step S102 involves preprocessing the dataset, including unifying the sampling rate to the target sampling rate Fs, slicing the data with a window length of W and a stride of H, and normalizing the mean MU and standard deviation STD calculated based on the training set label data. The normalization formula is as follows: ; Where X i For a single raw data point in a single segment, XN i For the normalized data, MU and STD are the mean and standard deviation calculated using the labeled data fragments of the training set, respectively, and W is the length of the window; Step S103: Construct a multi-scale one-dimensional convolutional reconstruction network; Step S104: Define the composite loss function for shape perception; Step S105: Train the network using the training set and calculate the comprehensive score (Score) on the validation set. The calculation formula is as follows: ; wherein s i is the direction-unified and normalized index score, w i is the corresponding weight; Step S106: Implement an early stopping strategy based on the comprehensive score. Stop training when there is no improvement for P consecutive rounds, and save the optimal model weights. P is a preset threshold. Step S107: Evaluate model performance using a test set.
2. The ECG signal denoising method based on multi-scale deep learning according to claim 1, characterized in that, In step S103, network reconstruction includes: The input layer uses one-dimensional convolution for preliminary feature extraction, with little or no normalization used. The multi-branch backbone includes local convolution branches and dilated convolution branches with different dilation rates to cover different time scales; The encoding and decoding structure fuses shallow fine-grained features with deep semantic features through skip connections; The output layer uses one-dimensional convolution to regress to the time-domain waveform, without normalization or using only linear activation.
3. The ECG signal denoising method based on multi-scale deep learning according to claim 1, characterized in that, In step S104, the expression for the composite loss function is: ; Among them, L time For time-domain reconstruction error, L stft For multi-scale short-time Fourier transform consistency loss, L diff With $L dd L represents the first-order and second-order difference consistency losses, respectively. band With L mid These represent the frequency band residual energy constraint and the intermediate frequency band protection loss, respectively. low With L ultra These are the same parameters for low frequency and ultra-low frequency, respectively, L trend L is the trend consistency parameter. peak For the adaptive weight loss constructed based on the heartbeat peak-valley neighborhood, L cons For the gated consistency loss based on the input signal-to-noise ratio, L mean For mean shift constraint, w t w s w d w b w m w tr w l w u w p w c w mean With w dd These are the various parameters.
4. The ECG signal denoising method based on multi-scale deep learning according to claim 1, characterized in that, The reasoning phase includes the following steps: Step S201: Perform the same preprocessing on the ECG signal to be processed as in the training phase; Step S202: Construct the input fragment, fill it with symmetrical reflection and slice it according to the window length W and step size W / 2; Step S203: Load the optimal model for inference to obtain the denoised segment; Step S204: Window, overlap, and sum the output segments, then perform inverse normalization to obtain the final denoised signal. The inverse normalization formula is: ; where y c is the reconstruction result, MU and STD are the mean and standard deviation of the training set label segments saved during model training, respectively.
5. The ECG signal denoising method based on multi-scale deep learning according to claim 2, characterized in that: In the multi-branch backbone, the hole convolution dilation rate is configurable to cover wide QRS waves and long-term baseline-drift structures.
6. The ECG signal denoising method based on multi-scale deep learning according to claim 3, characterized in that: In the composite loss function, L stft The difference in short-time Fourier transforms is calculated using multiple sets of window length and step size parameters; L band Calculate residual power deviation; bandwidth boundaries are configurable; L cons The gating functions used are Sigmoid and piecewise linear functions.
7. The method of claim 1, wherein the method is based on multi-scale deep learning. The calculation of the overall score includes the following steps: Unify the direction of all indicators; Scale normalization is performed on the validation set, and the normalization formula is: ; Where m is the current indicator value. i and σ i This represents the mean and standard deviation of the indicator's historical distribution on the validation set. The weighted sum is used to obtain the overall score.
8. A method for denoising electrocardiogram signals based on multi-scale deep learning according to claim 4, characterized in that, The windowing overlap addition process includes the following steps: Multiply the output segment by a Hanning window or other window function; Perform overlapping addition and divide by the weighted sum; Remove the beginning and end padding and align it with the original timeline.
9. The method of claim 1, wherein, The construction of the dataset involves adding at least one of the following types of noise to a clean electrocardiogram signal: transient noise, continuous noise, a combination of transient and continuous noise, or no noise.
10. The method of claim 1, wherein the method is based on multi-scale deep learning. The method supports multi-lead ECG signal processing, and in the inference stage, cross-lead weighted fusion can be performed based on the consistency between lead quality scores and cross-lead cross-correlation.