Chromatographic signal denoising method, system, and electronic device
By combining discrete wavelet decomposition and multi-scale convolutional neural networks, the problems of noise suppression and peak shape preservation of chromatographic signals over a wide signal-to-noise ratio range are solved, achieving efficient chromatographic signal denoising and improving signal accuracy and reliability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF AUTOMATION CHINESE ACAD OF SCI
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing chromatographic signal denoising methods struggle to balance noise suppression and peak shape preservation across a wide signal-to-noise ratio range, easily leading to excessive signal smoothing, detail distortion, and loss of subtle peak characteristics.
Discrete wavelet decomposition is used to convert the chromatographic signal to the wavelet domain. Multi-scale convolutional neural networks are used to extract and fuse the scale coefficient components at different scales. Feature reconstruction is performed through multi-scale convolutional neural networks, and finally, the denoised chromatographic signal is obtained through inverse wavelet transform.
It significantly improves the model's generalization ability and non-stationary noise suppression ability over a wide signal-to-noise ratio range, solves the problems of excessive peak smoothing and artifacts in traditional methods, and achieves high-fidelity chromatographic signal reconstruction.
Smart Images

Figure CN122241006A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of chromatographic signal processing technology, and in particular to a method, system and electronic device for chromatographic signal denoising. Background Technology
[0002] Chromatography plays a vital role in quantitative analysis fields such as environmental monitoring, pharmaceutical analysis, and petrochemicals. However, during the acquisition, transmission, and processing of chromatographic signals, they are inevitably affected by multi-source random noise such as thermal noise, electrical noise, and matrix interference, leading to baseline jitter, peak distortion, and a decrease in signal-to-noise ratio. This severely restricts the accuracy and reliability of analytical and quantitative analysis of complex samples.
[0003] Currently, denoising of chromatographic signals mainly relies on traditional signal processing methods and deep learning-based methods, but both have certain limitations in practical applications. Firstly, traditional filter methods (such as Savitzky-Golay filtering and Fourier transform low-pass filtering) often lead to over-smoothing of peak shapes and loss of detail when processing asymmetric peaks or low signal-to-noise ratio signals. Secondly, traditional wavelet transform and empirical mode decomposition (EMD) methods are prone to introducing distortion (such as pseudo-Gibbs phenomena) or are sensitive to endpoint effects in complex noise environments, resulting in insufficient overall robustness.
[0004] Secondly, while some traditional improvement methods that combine optimization algorithms have improved the signal-to-noise ratio to some extent, they usually rely heavily on manual parameter adjustments and converge slowly under strong noise or multi-peak overlap conditions, making them difficult to meet the needs of high-throughput analysis.
[0005] Furthermore, deep learning denoising models introduced in recent years (such as conventional one-dimensional convolutional neural networks) typically process time-domain signals directly. While these models perform reasonably well in low-noise environments, they are prone to over-smoothing in high-noise or mixed-noise scenarios, leading to the loss of subtle peak shapes and a decrease in diagnostic accuracy. Their generalization ability over a wide signal-to-noise ratio range is also significantly insufficient. Summary of the Invention
[0006] This invention provides a method, system, and electronic device for denoising chromatographic signals, which addresses the shortcomings of existing chromatographic signal denoising methods that struggle to balance noise suppression and peak shape preservation over a wide signal-to-noise ratio range, easily leading to excessive signal smoothing, detail distortion, and loss of subtle peak features.
[0007] This invention provides a method for denoising chromatographic signals, comprising: Obtain the chromatographic signal to be denoised; Discrete wavelet decomposition is performed on the chromatographic signal to be denoised to obtain multiple scale coefficient components at different scales. Each scale coefficient component is input into a pre-trained multi-scale convolutional neural network to obtain the scale prediction coefficients output by the multi-scale convolutional neural network. The multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted scale features, and reconstruct the features at each scale after fusion to obtain the reconstructed scale prediction coefficients. The predicted coefficients at each scale are subjected to inverse wavelet transform to obtain the denoised chromatographic signal.
[0008] According to a chromatographic signal denoising method provided by the present invention, the step of performing discrete wavelet decomposition on the chromatographic signal to be denoised to obtain multiple scale coefficient components of different scales includes: The discrete wavelet transform is used to perform multi-level decomposition on the chromatographic signal to be denoised, and the approximation coefficients of the last level decomposition and the detail coefficients generated by each level decomposition are obtained. The approximation coefficients and the detail coefficients generated by each level of decomposition are used as the scale coefficient components of the multiple different scales.
[0009] According to the chromatographic signal denoising method provided by the present invention, the multi-scale convolutional neural network includes multiple scale feature encoders, a multi-scale feature fusion layer, and multiple output head sub-networks, wherein the multiple scale feature encoders and the multiple output head sub-networks correspond one-to-one with the multiple scales after discrete wavelet decomposition. The step of inputting each scale coefficient component into a pre-trained multi-scale convolutional neural network to obtain the scale prediction coefficients output by the multi-scale convolutional neural network includes: Each scale coefficient component is input into the scale feature encoder of the corresponding scale to extract features from each scale coefficient component and obtain the scale coding features corresponding to each scale coefficient component. The extracted coded features at each scale are concatenated, and the concatenated features are fused through the multi-scale feature fusion layer to obtain fused features; The fused features are input into each output head sub-network for feature reconstruction, and each output head sub-network reconstructs initial prediction coefficients of uniform length. Each initial prediction coefficient is compressed in the channel dimension and truncated or padded in the time dimension to restore the coefficient component length to the original corresponding scale, thus obtaining the prediction coefficients for each scale.
[0010] According to the chromatographic signal denoising method provided by the present invention, each scale feature encoder consists of two layers of one-dimensional convolution and a ReLU activation function. The parameters of the scale encoders at different scales are independent of each other. The multi-scale feature fusion layer includes three cascaded layers of one-dimensional convolution. The parameters of each output head sub-network are not shared.
[0011] According to the chromatographic signal denoising method provided by the present invention, the step of determining the target wavelet basis function used in the discrete wavelet decomposition and the inverse wavelet transform includes: Under a unified model structure and training strategy, the test set is used to perform denoising tests on multiple candidate wavelet basis functions to obtain the denoised signals corresponding to each candidate wavelet basis function. Based on the denoised signal and the clean signal in the test set, the evaluation index of each candidate wavelet basis function is calculated, and the evaluation index includes the signal-to-noise ratio improvement and the root mean square error. The candidate wavelet basis function that maximizes the signal-to-noise ratio improvement and minimizes the root mean square error is determined as the target wavelet basis function.
[0012] According to a chromatographic signal denoising method provided by the present invention, the training steps of the multi-scale convolutional neural network include: Construct a dataset containing both pure chromatographic signals and noisy chromatographic signals; Discrete wavelet decomposition is performed on the noisy chromatographic signal to obtain the noisy coefficient components at different scales corresponding to the noisy chromatographic signal, and discrete wavelet decomposition is performed on the pure chromatographic signal to obtain the pure coefficient components at different scales corresponding to the pure chromatographic signal. The noisy coefficient components at each scale are input into the initial network to obtain the reconstruction prediction coefficients at each scale output by the initial network. Based on the difference between the reconstructed prediction coefficients and the pure coefficient components, the parameters of the initial network are iterated to obtain the multi-scale convolutional neural network.
[0013] According to a chromatographic signal denoising method provided by the present invention, the construction of a dataset containing pure chromatographic signals and noisy chromatographic signals includes: Multiple asymmetric Gaussian peaks are randomly generated, and the generated multiple asymmetric Gaussian peaks are superimposed to obtain the pure chromatographic signal; Gaussian white noise is added to the pure chromatographic signal to generate the noisy chromatographic signal; The dataset is constructed based on the pure chromatographic signal and the noisy chromatographic signal.
[0014] According to a chromatographic signal denoising method provided by the present invention, the step of iterating the parameters of the initial network based on the difference between the reconstructed prediction coefficients and the purity coefficient components includes: The errors between the reconstructed prediction coefficients and the pure coefficient components at each scale are calculated separately, and the errors at each scale are summed to obtain the total loss. Based on the total loss, the parameters of the initial network are jointly adjusted, including the learnable weights and biases of the feature encoders at each scale, the multi-scale feature fusion layer, and the output head sub-networks in the initial network.
[0015] The present invention also provides a chromatographic signal denoising system, comprising: The signal acquisition module is used to acquire the chromatographic signal to be denoised; The wavelet decomposition module is used to perform discrete wavelet decomposition on the chromatographic signal to be denoised, and obtain multiple scale coefficient components at different scales. A multi-scale network denoising module is used to input the coefficient components of each scale into a pre-trained multi-scale convolutional neural network to obtain the prediction coefficients of each scale output by the multi-scale convolutional neural network. The multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted features of each scale, and reconstruct the features of each scale from the fused features to obtain the reconstructed prediction coefficients of each scale. The signal reconstruction module is used to perform wavelet inverse transform on the prediction coefficients at each scale to obtain the denoised chromatographic signal.
[0016] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the chromatographic signal denoising method as described above.
[0017] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the chromatographic signal denoising method as described above.
[0018] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the chromatographic signal denoising method as described above.
[0019] The chromatographic signal denoising method, system, and electronic device provided by this invention, for the chromatographic signal to be denoised, firstly transforms it to the wavelet domain through discrete wavelet decomposition to obtain multiple scale coefficient components at different scales. Utilizing the multi-resolution analysis characteristics of wavelets, the mixed effective signal and complex noise are physically separated and decoupled at different frequency scales, thus significantly reducing the difficulty of subsequent network processing. Subsequently, these scale coefficient components at different scales are input into a pre-trained multi-scale convolutional neural network. This network performs independent feature extraction and reconstruction for each scale coefficient component. This independent processing mechanism allows the model to process features correspondingly for different frequency bands; for example, focusing on preserving the main peak contour in the low-frequency part and removing random noise in the high-frequency part. Simultaneously, the network deeply fuses features at each scale during independent processing, ensuring the coherence of global contextual information and effectively overcoming the artifacts and over-smoothing problems easily caused by severing frequency band connections in traditional denoising methods. Finally, by performing inverse wavelet transform on the reconstructed prediction coefficients at each scale, the denoised chromatographic signal is accurately restored. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in this invention or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0021] Figure 1 This is a schematic flowchart of the chromatographic signal denoising method provided by the present invention; Figure 2 This is a schematic diagram of the structure of the multi-scale convolutional neural network provided by the present invention; Figure 3 This is a schematic diagram of the process for determining wavelet basis functions provided by the present invention; Figure 4 This is a schematic diagram comparing the noise reduction performance of the Symlets wavelet family provided by this invention; Figure 5 This is a schematic diagram of the training process of the multi-scale convolutional neural network provided by the present invention; Figure 6 This is a schematic diagram of the chromatographic dataset provided by the present invention; Figure 7 This is a flowchart illustrating the chromatographic signal denoising method based on a wavelet domain multi-scale convolutional neural network provided by the present invention. Figure 8 This is a schematic diagram of the actual signal denoising results provided by the present invention; Figure 9This is a schematic diagram of the chromatographic signal denoising system provided by the present invention; Figure 10 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0023] Chromatographic techniques, such as ultra-high performance liquid chromatography (UHPLC) and gas chromatography (GC), play a crucial role in separation science and high-precision quantitative analysis in fields such as environmental monitoring, pharmaceutical analysis, and petrochemicals. However, chromatographic signals are inevitably affected by multi-source random noise during acquisition, transmission, and processing, including thermal noise, electrical noise, matrix interference, and instrument drift. This noise can cause baseline jitter, peak distortion, and a decrease in signal-to-noise ratio, severely limiting the accuracy and reliability of peak signal detection, quantitative analysis, and the resolution of complex samples. Therefore, noise suppression and peak shape preservation are key preprocessing steps for improving the quality of chromatographic data.
[0024] In recent years, with the improvement of analytical instrument resolution and the increase in sample complexity, the demand for chromatographic signal denoising algorithms has become increasingly urgent to ensure the extraction of weak peak signals and the fidelity of the overall chromatogram. Currently, existing technical solutions for denoising chromatographic signals mainly include traditional signal processing methods and deep learning-based methods, but these methods all have certain shortcomings in practical applications.
[0025] First, traditional chromatographic signal denoising methods mainly rely on signal processing techniques such as filters and wavelet transforms. Early classic denoising methods (such as Savitzky-Golay filtering, Fourier transform low-pass filtering, etc.) can effectively remove high-frequency noise, but often lead to excessive smoothing of peak shapes and loss of details, especially when processing asymmetric peaks or low signal-to-noise ratio signals.
[0026] Secondly, wavelet transform, as a multi-resolution analysis tool, is widely used in separation science for noise suppression and peak overlap detection. However, traditional wavelet thresholding methods are prone to introducing pseudo-Gibbs phenomena (distortion) or constant bias, leading to distortion of the reconstructed signal. While denoising methods based on empirical mode decomposition (EMD) can adaptively decompose signals, they are too sensitive to mode mixing and endpoint effects, and their robustness is severely insufficient in complex noise environments.
[0027] In recent years, some methods combining optimization algorithms (such as improved wavelet threshold functions combined with genetic particle swarm optimization) have improved the signal-to-noise ratio, but they still rely heavily on manual parameter adjustment. Furthermore, the optimization convergence is extremely slow under strong noise or multi-peak overlap conditions, and they cannot completely solve the problem of adaptability to noise at different scales. Overall, they are still difficult to handle non-stationary noise and complex peak shapes, making it difficult to meet the requirements of high-throughput analysis.
[0028] Furthermore, with the development of deep learning technology, deep learning models such as Convolutional Neural Networks (CNNs) have been introduced into the field of chromatographic denoising. However, existing CNN models typically process time-domain signals directly, performing reasonably well in low-noise environments. But in high-noise or mixed-noise scenarios, they often struggle to balance noise stripping and peak shape preservation, easily resulting in over-smoothing and artificial artifacts, leading to the loss of subtle peak shapes. These limitations highlight the insufficient generalization ability of existing models over a wide signal-to-noise ratio range and their inability to meet the demand for fine-grained capture of multi-scale noise features.
[0029] To address this, the present invention provides a chromatographic signal denoising method based on a wavelet domain multi-scale convolutional neural network. By combining discrete wavelet transform with deep learning, it achieves effective noise suppression and peak shape preservation of ultra-high performance liquid chromatography signals, thereby overcoming the aforementioned defects.
[0030] It should be noted that the execution entity of the chromatographic signal denoising method provided by this invention can be a computing device. This computing device can be an electronic device with data processing capabilities, such as a personal computer, a local or cloud server, a network device, or a chromatographic data processing workstation directly integrated into a chromatographic analysis instrument (such as UHPLC, GC, etc.). This execution entity has a corresponding processor and memory, capable of running deep learning frameworks and signal processing algorithms, thereby achieving end-to-end automated preprocessing and high-precision denoising of the input noisy chromatographic signal. It should be understood that all actions involving the acquisition of signals, information, or data in this invention are performed in accordance with the relevant data protection laws and regulations of the country where the invention is located, and with authorization from the owner of the corresponding device.
[0031] Figure 1 This is a schematic flowchart of the chromatographic signal denoising method provided by the present invention, as shown below. Figure 1 As shown, the method includes: Step 110: Obtain the chromatographic signal to be denoised.
[0032] Specifically, in separation science applications such as pharmaceutical analysis, environmental monitoring, and petrochemicals, chromatographs are inevitably affected by a combination of random noises from multiple sources, including thermal noise, electrical noise, matrix interference, and instrument drift, during signal acquisition, transmission, and processing. These noises can cause baseline jitter, peak distortion, and a significant decrease in signal-to-noise ratio in chromatographic signals. Therefore, the chromatographic signal to be denoised in this step is typically one-dimensional time-series data containing the aforementioned non-stationary noises. This signal not only contains true chromatographic peaks reflecting the composition and concentration of substances (usually exhibiting significant asymmetry) but also incorporates complex background noise over a wide frequency band.
[0033] Step 120: Perform discrete wavelet decomposition on the chromatographic signal to be denoised to obtain multiple scale coefficient components of different scales.
[0034] Specifically, Discrete Wavelet Transform (DWT) is a mathematical signal processing tool with multi-resolution analysis capabilities. Its core principle is to transform a one-dimensional chromatographic signal into the wavelet domain by convolving and downsampling the original time-domain signal with basis functions at different scales (i.e., frequencies). In this process, the originally complex, intertwined signal is physically broken down and decoupled into multiple signal components of different frequency bands, i.e., multiple scale coefficient components at different scales. These coefficient components typically include approximation coefficients and detail coefficients. Approximation coefficients represent the low-frequency components of the signal, such as extremely low-frequency baseline trends and the main chromatographic peak profile; detail coefficients represent the high-frequency components of the signal, typically containing most random noise and subtle edges.
[0035] For example, when performing discrete wavelet decomposition on a chromatographic signal to be denoised, the signal can be decomposed into multiple layers using iterative decomposition. The approximation coefficients of the last layer and the detail coefficients of all layers are retained, and these coefficients are used together as the scale coefficient components at the aforementioned multiple different scales. Through this physical separation, the effective signal and noise mixed together can be separated at different frequency scales, thereby reducing the difficulty of subsequent feature extraction and noise recognition by deep neural networks.
[0036] Step 130: Input each scale coefficient component into a pre-trained multi-scale convolutional neural network to obtain the scale prediction coefficients output by the multi-scale convolutional neural network. The multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted scale features, and reconstruct the features at each scale after fusion to obtain the reconstructed scale prediction coefficients.
[0037] Specifically, this step aims to refine the denoising of the wavelet domain coefficients using a deep learning model. The pre-trained Multi-Scale Convolutional Neural Network (MSCNN) is a deep learning model in this embodiment of the invention that achieves both peak shape fidelity and noise suppression.
[0038] Instead of using a single feature extraction structure for mixed processing of the multiple scale components obtained after wavelet decomposition, the network constructs a feature extraction and reconstruction engine that combines multi-scale independent extraction with cross-scale deep fusion. First, the network extracts features independently for each scale coefficient component. This is equivalent to assigning independent encoders to different frequency bands, enabling the network to adopt differentiated processing strategies. For example, when processing low-frequency components, the network focuses on preserving the asymmetric chromatographic peak shape, while when processing high-frequency components, it focuses on accurately removing random noise.
[0039] Secondly, after independently extracting deep features, the network will perform deep interaction and fusion of features at various scales. This cross-scale global context information integration can ensure the coherence during reconstruction and effectively avoid the artifacts (pseudo-Gibbs phenomenon) and over-smoothing problems that are easily generated in traditional wavelet threshold denoising due to the severing of the connection between various frequency bands.
[0040] Finally, the network performs multi-scale independent feature reconstruction on the fused unified features, and outputs denoised prediction coefficients restored to the corresponding scales, completing the nonlinear mapping from noisy wavelet coefficients to clean wavelet coefficients.
[0041] Step 140: Perform wavelet inverse transform on the prediction coefficients at each scale to obtain the denoised chromatographic signal.
[0042] Specifically, the Inverse Discrete Wavelet Transform (IDWT) is the reverse reconstruction process of the aforementioned discrete wavelet decomposition. As the final step in the entire end-to-end chromatographic signal denoising system, it upsamples and convolves the denoising prediction coefficients at multiple scales output by the network, resynthesizing the processed wavelet domain coefficients into the final one-dimensional time-domain chromatographic signal. This step enables the denoising system to directly receive noisy chromatographic sequences and output smooth, continuous, and high-fidelity pure chromatograms, facilitating subsequent automatic preprocessing of chromatographic data and high-precision quantitative analysis.
[0043] The method provided in this invention integrates the multi-resolution physical degradation coupling characteristics of discrete wavelet transform with the powerful deep nonlinear feature extraction capability of multi-scale convolutional neural networks. It constructs a novel architecture in the wavelet domain that features multi-scale independent coding, cross-scale deep fusion, and independent decoding. This not only significantly improves the model's generalization ability and non-stationary noise suppression capability over a wide signal-to-noise ratio range, but also solves the problems of subtle peak loss, excessive peak smoothing, and artificial distortion that traditional methods easily cause when denoising in strong noise or complex multi-peak overlapping scenarios. Thus, it achieves a high level of peak shape fidelity and robustness.
[0044] Based on the above embodiments, step 120 specifically includes: Step 121: The chromatographic signal to be denoised is decomposed into multiple layers using discrete wavelet transform to obtain the approximation coefficients of the last layer decomposition and the detail coefficients generated by each layer decomposition. Step 122: The approximation coefficients and the detail coefficients generated by each level of decomposition are used as the scale coefficient components of the multiple different scales.
[0045] Specifically, in practical applications of Discrete Wavelet Transform (DWT), signal decomposition is typically a cascaded (iterative) process. Each decomposition divides the input signal into a low-frequency approximation component and a high-frequency detail component. The approximation coefficients represent the low-frequency components of the signal, primarily corresponding to the extremely low-frequency baseline trend and the overall profile of the main chromatographic peak in chromatographic signals. The detail coefficients, on the other hand, represent the high-frequency components, usually containing most of the random noise and some subtle signal edge features. To reconstruct the original signal and achieve thorough multi-resolution analysis, in each decomposition layer, the wavelet transform only applies the approximation coefficients obtained from the previous layer to the next layer, while the detail coefficients generated at each layer are retained and not further decomposed.
[0046] For example, if the decomposition layer number is set to 4 layers, then the first layer decomposition will decompose the original noisy signal (i.e., the chromatographic signal to be denoised) into a set of first-layer approximation coefficients (denoted as...). ) and a set of first-level detail coefficients (represented as The second-level decomposition only applies to the first-level approximation coefficients. This process yields a set of second-level approximation coefficients (represented as...). ) and a set of second-level detail coefficients (represented as At this point, the first level of detail coefficients The first set of approximation coefficients is retained; and so on. The third and fourth levels of decomposition continue to strip away the approximation coefficients from the previous level. Finally, after the fourth level of decomposition, a final set of fourth-level approximation coefficients is obtained (denoted as...). ), and the four sets of detail coefficients (i.e., from layer 1 to layer 4) generated during the entire decomposition process. ).
[0047] After the above multi-level cascaded decomposition, the original one-dimensional time-domain chromatographic signal was completely transformed into the wavelet domain, forming a set of coefficients containing features of multiple frequency bands with different resolutions. Taking a 4-level decomposition as an example, the final extracted set includes the approximation coefficients of the 4th level (1 set) and the detail coefficients from the 1st to the 4th levels (4 sets), totaling 5 scale components, as shown below: These retained approximation coefficients from the last layer, along with the detail coefficients generated by all intermediate layers, together constitute the scaling coefficient components of the neural network at multiple different scales. They mathematically and completely represent the original signal, ensuring that information is not lost during the transformation process.
[0048] The method provided in this invention employs a multi-layered cascaded discrete wavelet transform to extract the last layer's approximation coefficients and the detail coefficients of each layer as multi-scale coefficient components. This fully utilizes the multi-resolution analysis characteristics of wavelet transform, enabling fine decoupling of the mixed effective main peak signal and random noise at different frequency scales at the physical level. This not only preserves complete frequency domain information for subsequent signal reconstruction but also reduces the difficulty of feature extraction and noise recognition in subsequent deep neural networks, allowing the network to perform differentiated processing for features in different frequency bands.
[0049] Based on any of the above embodiments Figure 2 This is a schematic diagram of the structure of the multi-scale convolutional neural network provided by the present invention, as shown below. Figure 2 As shown, the multi-scale convolutional neural network includes multiple scale feature encoders, a multi-scale feature fusion layer, and multiple output head sub-networks (i.e., Figure 2 The scale-specific decoder shown in the diagram, wherein the plurality of scale feature encoders and the plurality of output head sub-networks correspond one-to-one with the plurality of scales after discrete wavelet decomposition. Each scale feature encoder consists of two layers of one-dimensional convolution and a ReLU activation function. The parameters of the scale encoders at different scales are independent of each other. The multi-scale feature fusion layer includes cascaded three layers of one-dimensional convolution. The parameters of each output head sub-network are not shared.
[0050] Specifically, the multi-scale convolutional neural network provided in this embodiment of the invention constructs a feature extraction and reconstruction engine that integrates multi-scale independent extraction and cross-scale deep fusion. Assuming the preceding wavelet decomposition layers are 4, the corresponding number of scales is 5, namely 1 low-frequency approximation scale and 4 high-frequency detail scales, such as... Figure 2 The scales 1 to 5 shown in the diagram are correspondingly configured with 5 independent parallel scale feature encoders and 5 independent parallel output head subnetworks (i.e., Figure 2(Decoders 1 to 5 shown in the diagram). This one-to-one correspondence architecture design breaks through the limitations of traditional one-dimensional convolutional networks in processing time-domain signals, enabling the network to perform differentiated processing on the extracted features of different frequency bands.
[0051] Accordingly, the forward computation and feature processing steps of the multi-scale convolutional neural network (i.e., step 130) specifically include: Step 131: Input each scale coefficient component into the scale feature encoder of the corresponding scale to extract features from each scale coefficient component and obtain the scale coding features corresponding to each scale coefficient component.
[0052] Specifically, in this step, each scale feature encoder consists of two layers of one-dimensional convolution (Conv1D) and a ReLU activation function, with the parameters of the scale encoders at different scales being independent of each other. Since the physical meaning of chromatographic signals differs at different scales—for example, low-frequency approximation coefficients mainly characterize the true main chromatographic peak profile, while high-frequency detail coefficients mainly contain random noise—the independent encoders with mutually independent parameters allow the network to adaptively learn completely different feature extraction strategies. Specifically, for the low-frequency part, the network parameters focus on learning how to faithfully preserve the morphology of asymmetric chromatographic peaks; for the high-frequency part, the network parameters focus on learning how to strip and remove noise. Through the two-layer nonlinear mapping of their respective scale feature encoders, wavelet coefficients of each dimension are transformed into deep, high-dimensional feature maps (i.e., scale-encoded features), and all feature maps are aligned to a uniform maximum length (e.g., ...) at this stage. ).
[0053] Step 132: The extracted coded features at each scale are concatenated, and the concatenated features are fused through the multi-scale feature fusion layer to obtain fused features.
[0054] Specifically, in order to break down the information silos formed by the independent encoding of each frequency scale, the network first concatenates the five independently extracted deep-scale encoded features along the channel dimension to form a unified global feature map (i.e., the concatenated features, also known as the features before multi-scale fusion), as shown below: in, Represents the first after unifying the maximum length. The encoding features are defined by a scale; B is the batch size, representing the number of samples input into the neural network at one time.
[0055] Then, the spliced features Through three cascaded 1×1 convolutional layers (i.e. Figure 2The three 1×1 Conv modules shown are deeply fused while maintaining the temporal resolution. The features after the fusion layer (i.e., the fused features) are then... ) is represented as: Understandably, the three 1×1 convolutional layers in the multi-scale feature fusion layer, while maintaining the temporal resolution, are primarily used to achieve channel-level compression and cross-scale deep fusion. Through this deep interaction, the network effectively integrates the frequency relationships of the global context, enabling information from different frequency bands to reference each other, thereby providing coherent support for subsequent reconstruction.
[0056] Step 133: Input the fused features into each output head sub-network for feature reconstruction, and each output head sub-network reconstructs an initial prediction coefficient of uniform length.
[0057] Specifically, in the output decoding stage, the fused global features The parameters are simultaneously assigned to multiple output head sub-networks, and these output head sub-networks do not share parameters with each other. For example, for the five scales mentioned above, each scale is assigned an independent output head sub-network (i.e., decoder) during the output stage. These five independent sub-networks configured at the output end share the same fusion feature. However, since their internal parameters are not shared, they can execute independent output strategies based on the reconstruction requirements of their respective frequency bands. After feature reconstruction by the convolutional layers within each output head sub-network, five sets of outputs with a uniform maximum length are generated. The initial prediction coefficients are all denoised coefficients. The initial prediction coefficients for each scale are represented as follows: Step 134: Compress each initial prediction coefficient in the channel dimension and truncate or pad it in the time dimension to restore the coefficient component length to the original corresponding scale, thus obtaining the prediction coefficients for each scale.
[0058] Specifically, because the preceding network layers standardized the length of the feature sequences of each branch for ease of splicing and fusion, but in actual Discrete Wavelet Decomposition (DWT), due to the existence of downsampling, the lengths of the original wavelet coefficients at different scales (i.e., There are differences. Therefore, in this step, a special adjustment operation (i.e., CropOrPad operation) is set at the network output: first, the channel dimension is compressed back to the single dimension required by the coefficients; then, on the time axis, the length is precisely truncated (removing the excess part) or padded according to the actual physical length requirement of each scale to restore the original wavelet coefficient length of the original corresponding scale. Finally, accurate prediction coefficients at each scale are obtained that can be directly used for inverse wavelet transform (IDWT). Scale prediction coefficients at each scale It can be represented as follows: Finally, by using inverse wavelet transform to synthesize these denoised prediction coefficients at each scale, the final denoised chromatographic signal can be obtained.
[0059] The method provided in this invention, by independently configuring convolutional encoders and output decoders with non-shared parameters for different wavelet scales, endows the network with highly adaptive local frequency band processing capabilities. This enables the network model to both maintain high fidelity for effective low-frequency chromatographic peak shapes and effectively suppress complex high-frequency baseline noise. Simultaneously, through a multi-scale feature fusion layer composed of cascaded 1×1 convolutional layers, a cross-scale global contextual relationship is established in the intermediate stage of independent processing. This overcomes the pseudo-Gibbs distortion phenomenon that is easily induced by severing the correlation between frequency bands in traditional wavelet threshold denoising. Furthermore, by using truncation and padding post-processing, seamless connection of the inverse wavelet transform is ensured, improving the smoothness and peak shape accuracy of the final denoised signal.
[0060] Based on any of the above embodiments Figure 3 This is a schematic diagram of the wavelet basis function determination process provided by the present invention, as shown below. Figure 3 As shown, the steps for determining the target wavelet basis function used in the discrete wavelet decomposition and the inverse wavelet transform include: Step 310: Under a unified model structure and training strategy, the test set is used to perform denoising tests on multiple candidate wavelet basis functions to obtain the denoised signal corresponding to each candidate wavelet basis function.
[0061] It is important to note that the choice of wavelet basis functions is crucial to the performance of Discrete Wavelet Decomposition (DWT) and Inverse Wavelet Transform (IDWT), serving as the basis for signal domain transformation. Different wavelet bases have different support lengths and symmetries, which directly determine their effectiveness in breaking down complex chromatographic signals and physically dispersing (decoupling) them into multi-resolution frequency bands. Since real chromatographic signals typically contain complex asymmetric peaks and random noise, core hyperparameters can be optimized through scientific experiments to maximize the model's denoising upper limit and ensure peak shape fidelity.
[0062] Specifically, in this embodiment of the invention, the system examines multiple candidate wavelet basis functions (i.e., sym2 to sym11, a total of 10 wavelet bases) with different support lengths and symmetries in the Symlets wavelet family. To ensure the fairness of the comparison, the system, under a unified network model structure (such as the multi-scale convolutional neural network described in the above embodiment) and training strategy, uses a test set containing synthesized ultra-high performance liquid chromatography (UHPLC) samples to perform denoising tests on these 10 candidate wavelet basis functions respectively, obtaining the denoised signal corresponding to each candidate wavelet basis function after processing.
[0063] Specifically, the denoising test process for each candidate wavelet basis function is as follows: First, using the candidate wavelet basis function currently under consideration (e.g., sym2), a forward discrete wavelet decomposition (DWT) is performed on each noisy chromatographic signal in the test set (e.g., containing 100 synthetic ultra-high performance liquid chromatography samples covering a wide signal-to-noise ratio range of 1~35dB). Through convolution and downsampling, the corresponding multi-scale noisy coefficient components (i.e., low-frequency approximation coefficients and high-frequency detail coefficients) are physically extracted from the one-dimensional time-domain noisy signal. Then, these multi-scale noisy coefficient components are input into a multi-scale convolutional neural network with a unified structure for forward propagation inference. After being processed sequentially by the network's multi-scale feature encoder, feature fusion layer, and output head sub-network, the corresponding denoising prediction coefficients for each scale are output. Finally, the same candidate wavelet basis function (e.g., sym2) is used again to perform an inverse discrete wavelet inverse transform (IDWT) on the denoising prediction coefficients output by the network for each scale. Through upsampling and convolution superposition, the signals are reconstructed and synthesized into the final denoised signal in the time domain.
[0064] Through the complete closed-loop reasoning process of decomposition, network prediction, and reconstruction described above, the upper limit of the actual denoising effect that can be achieved when the specific candidate wavelet basis function is used as the signal domain transformation basis and is combined with the deep learning network architecture constructed in this invention can be truly and accurately reflected.
[0065] Step 320: Based on the denoised signal and the clean signal in the test set, calculate the evaluation index of each candidate wavelet basis function. The evaluation index includes the signal-to-noise ratio improvement and the root mean square error.
[0066] Specifically, for each candidate wavelet basis function, the system compares the denoised signal it generates with the corresponding clean signal in the test set (i.e., the reference signal without added Gaussian white noise) point by point to quantitatively evaluate its denoising performance. This embodiment of the invention employs two core evaluation metrics: one is the improvement in signal-to-noise ratio (SNR)... One is used to measure the model's ability to suppress noise over a wide signal-to-noise ratio range; the other is the root mean square error (RMSE). This is used to directly measure the point-to-point deviation between the denoised signal and the original pure signal, thereby reflecting the model's fidelity in preserving asymmetric chromatographic peak shapes.
[0067] Specifically, for a certain candidate wavelet basis function, suppose its corresponding estimated signal (i.e., the denoised signal) is... The corresponding true pure signal is The estimation error is defined as: in, Indicates at a specific moment The estimation error, This represents the estimated signal value after denoising at a specific time. This represents the original, true, and pure signal value at a specific moment. Indicates the first The specific time corresponding to each sampling point The sampling point index represents the discrete signal (i.e., the sequence number of the data point, used to traverse each point in the signal sequence).
[0068] Therefore, the formula for calculating the signal-to-noise ratio (SNR) is: in, The term represents the total number of sampling points in the chromatographic signal sequence (i.e., the total length of data points in a chromatographic signal, for example, 8000 data points in a signal); the numerator represents the signal power, and the denominator represents the error power.
[0069] Given a noisy signal With denoising results The SNR improvement is defined as: Meanwhile, the root mean square error directly measures the point-to-point deviation between the denoised signal and the original clean signal, and its calculation formula is as follows: Using the above calculation formula, the signal-to-noise ratio improvement can accurately measure the model's ability to suppress noise over a wide signal-to-noise ratio range, while the root mean square error directly reflects the degree to which the denoised signal retains the original chromatographic peak shape.
[0070] Step 330: The candidate wavelet basis function that maximizes the signal-to-noise ratio improvement and minimizes the root mean square error is determined as the target wavelet basis function.
[0071] Specifically, Figure 4 This is a comparative diagram of the noise reduction performance of the Symlets wavelet family provided by this invention, as shown in the figure. Figure 4 As shown, the bar chart in the upper left corner (i.e. Figure 4 (a) in the figure represents the average signal-to-noise ratio improvement ( The comparison shows the signal-to-noise ratio improvement effect brought about by different wavelet basis denoising. The higher the value, the stronger the denoising ability of the model to restore the noisy signal to a clean state. From Figure 4 As can be seen in (a), the rightmost sym11 achieved the highest value in the entire test group, namely 23.84dB, which indicates that sym11 has the best noise suppression strength.
[0072] The bar chart in the upper right corner (i.e.) Figure 4 (b) shows the comparison of the average root mean square error (RMSE), which demonstrates the point-to-point deviation between the denoised signal and the pure signal. The smaller the value, the closer the denoised signal is to the true signal, meaning better fidelity of the chromatographic peak and less distortion. Figure 4 As can be seen in (b), sym11 on the far right achieved the lowest error value in the entire group, at 2.085 × 10⁻⁶. -2 This indicates that using sym11 as a wavelet basis can filter out noise while preserving the original morphology of asymmetric chromatographic peaks to the greatest extent and avoiding excessive smoothing.
[0073] The box plot below (i.e.) Figure 4 (c) in the figure shows in detail the overall performance distribution of each wavelet basis on 100 test samples. Figure 4 In (c), each small dot represents a single test sample. The boxes represent the median and upper and lower quartiles of the data, the extending whiskers represent the extreme range of normal data, and the scattered dots represent outliers. Because the test set includes a wide range of noise from extremely strong noise (1dB) to extremely weak noise (35dB), the performance of all wavelet bases will fluctuate (i.e., there are scattered points). However, in comparison, the overall position of the boxes in Sym11 is higher, indicating that it not only has a high average score but also provides stable and robust denoising performance when faced with complex noise of various intensities.
[0074] This fully demonstrates that, compared to other wavelet bases, the morphological features of sym11 have the highest degree of fit and matching with the true asymmetric chromatographic peak shape. Therefore, the system ultimately determines sym11, which has the best overall performance, as the target wavelet basis function, and applies it consistently throughout all subsequent model training and inference processes in discrete wavelet decomposition and inverse wavelet transform, significantly reducing the difficulty for neural networks in feature extraction and noise recognition.
[0075] The method provided in this invention comprehensively and objectively evaluates various candidate wavelet basis functions with different support lengths and symmetries under a unified training and testing environment, and selects the target wavelet basis function (such as sym11) that best matches the morphological characteristics of complex asymmetric chromatographic peaks. This not only provides the most suitable conversion basis for the physical separation (decoupling) of the front-end time and frequency domains, significantly reducing the difficulty of high and low frequency feature extraction and recognition in subsequent deep learning networks, but also essentially establishes a good upper limit performance for the denoising system, achieving a significant improvement in signal-to-noise ratio and extremely low root mean square error in reconstruction over a wide signal-to-noise ratio range, ensuring high-precision fidelity of weak peaks and overlapping peaks in a strong noise environment.
[0076] Based on any of the above embodiments Figure 5 This is a schematic diagram of the training process of the multi-scale convolutional neural network provided by the present invention, as shown below. Figure 5 As shown, the training steps of the multi-scale convolutional neural network include: Step 510: Construct a dataset containing both pure chromatographic signals and noisy chromatographic signals.
[0077] It should be noted that training deep learning models requires large-scale data with clean, real-world labels. However, in real-world chromatographic analysis instrument acquisition processes, thermal noise, electrical noise, and baseline drift are unavoidable, making it impossible to directly obtain absolutely noise-free chromatographic signals as real-world labels for deep learning network training. To address this issue, this invention employs a data synthesis strategy that closely approximates the physical characteristics of ultra-high performance liquid chromatography (UHPLC).
[0078] Specifically, the constructed dataset can contain multiple (e.g., 500) simulated chromatographic signals. The simulated chromatographic signals are generated by superimposing 2 to 6 asymmetric Gaussian peaks. Each chromatographic signal contains 8,000 data points, corresponding to a time range of 0 to 20 minutes, a sampling frequency of 6.67 Hz, and a signal-to-noise ratio covering a range of 1 to 35 dB.
[0079] The specific implementation process of dataset construction is described below. Step 510 specifically includes: Step 511: Randomly generate multiple asymmetric Gaussian peaks and superimpose the generated multiple asymmetric Gaussian peaks to obtain the pure chromatographic signal; Step 512: Add Gaussian white noise to the pure chromatographic signal to generate the noisy chromatographic signal; Step 513: Based on the pure chromatographic signal and the noisy chromatographic signal, construct the dataset.
[0080] Figure 6 This is a schematic diagram of the chromatographic dataset provided by the present invention, as shown below. Figure 6As shown, during the generation process, the system randomly selects values for the number of chromatographic peaks (e.g., randomly generating 2-6), the retention time of the peak initiation point, the peak height, and the peak width, thereby simulating complex baseline morphologies such as multi-peak overlap and severe asymmetry commonly found in real chromatograms. Figure 6 As shown in the Origin curve, the synthesized pure signal exhibits ideal chromatographic peak characteristics with extremely clear and no fluctuations.
[0081] Specifically, the dataset contains 500 clean signals. It is generated by superimposing 2 to 6 asymmetric Gaussian peaks, and the generation formula is as follows: in, , indicating that the number of peaks generated is randomly selected between 2 and 6; , representing the retention time of the k-th peak starting point; This indicates that the peak height is randomly selected between 0.5 and 5. The peak width parameter represents the chromatographic peak of the structure.
[0082] Noisy signal There are two types: the first type, From pure signal Add Gaussian white noise constitute, The signal-to-noise ratio (SNR) is randomly selected from [1, 10] dB. The second method... and The construction method is the same, but the SNR is randomly selected from [10, 35] dB.
[0083] Specifically, noisy chromatographic signals can be generated by adding Gaussian white noise to pure chromatographic signals. To enable the model to generalize to handle noise of different intensities, random intensities of Gaussian white noise are added to the system, precisely controlling the signal-to-noise ratio (SNR) of the generated data to cover a wide range of 1–35 dB.
[0084] like Figure 6 As shown, Figure 6 The noise curves in (a), (b), (c), and (d) all represent the noisy chromatographic signals after the addition of Gaussian white noise. Figure 6 The signal-to-noise ratio of the noisy chromatographic signal in (a) is 1 dB. Figure 6 (b) The signal-to-noise ratio of the noisy chromatographic signal is 10 dB. Figure 6 (c) The signal-to-noise ratio of the noisy chromatographic signal is 20 dB. Figure 6 The signal-to-noise ratio of the noisy chromatographic signal in (d) is 35 dB. From Figure 6 It can be seen from this that under extremely strong noise (such as...) Figure 6(a) In the chromatogram of 1 dB noise, the original effective chromatographic peaks are almost completely submerged by dense high-frequency noise; while under medium and weak noise (such as... Figure 6 (c) 20dB or Figure 6 (d) shows a chromatogram with 35 dB of noise, where the noise amplitude is small but still adheres to the peak edges and baseline. This method of adding noise allows the system to obtain paired training samples over an extremely wide signal-to-noise ratio range that is very difficult to control in real-world collections.
[0085] Finally, based on the generated pure and noisy chromatographic signals, a dataset can be constructed. The large number of simulated signals generated in pairs (i.e., the dataset) are divided into training, validation, and test sets, allowing the model to fully learn diverse data distribution characteristics during the training phase, thus possessing good generalization ability.
[0086] Step 520: Perform discrete wavelet decomposition on the noisy chromatographic signal to obtain the noisy coefficient components at different scales corresponding to the noisy chromatographic signal; perform discrete wavelet decomposition on the pure chromatographic signal to obtain the purity coefficient components at different scales corresponding to the pure chromatographic signal. Step 530: Input the noisy coefficient components of each scale into the initial network to obtain the reconstruction prediction coefficients of each scale output by the initial network; Step 540: Based on the difference between the reconstructed prediction coefficients and the pure coefficient components, perform parameter iteration on the initial network to obtain the multi-scale convolutional neural network.
[0087] Specifically, after preparing the dataset, the forward decomposition and prediction stage begins: the noisy chromatographic signal is decomposed using discrete wavelet decomposition to obtain noisy coefficient components at different scales, and the clean chromatographic signal is also decomposed using discrete wavelet decomposition to obtain clean coefficient components at different scales. Here, the previously selected target wavelet basis function (such as sym11) is used for multi-level decomposition. The noisy coefficient components serve as the actual input data to the network, while the clean coefficient components serve as the true label that the network output strives to approximate. Subsequently, the noisy coefficient components at each scale are input into the initial network to obtain the reconstructed prediction coefficients at each scale output by the initial network.
[0088] The next crucial stage is error calculation and parameter update. Based on the difference between the reconstructed predicted coefficients and the pure coefficient components, the initial network's parameters are iterated to obtain a multi-scale convolutional neural network. Step 540 specifically includes: Step 541: Calculate the error between the reconstructed prediction coefficient and the pure coefficient component at each scale, and sum the errors at each scale to obtain the total loss; Step 542: Based on the total loss, jointly adjust the parameters of the initial network. The parameters include the learnable weights and biases of the feature encoders at each scale, the multi-scale feature fusion layer, and the output head sub-networks in the initial network.
[0089] Specifically, the L1 loss function can be used to calculate the absolute error between the reconstructed predicted coefficients and their corresponding purity coefficient components at each scale (such as the five independent scales described in the previous examples). Then, the errors across all scales are summed to obtain the total loss. The formula for calculating the loss function is as follows: Where C=5 represents the scale number, Indicates the batch size (e.g., a value of 8). Indicates the first The local L1 loss between the reconstructed predicted coefficients and the true purity coefficients is reconstructed at each scale.
[0090] The reason for choosing to calculate the coefficient errors directly at five independent scales in the wavelet domain, rather than inversely transforming them back to the time domain, is understandable. Firstly, it achieves decoupled supervision and precise multi-scale error correction. Low-frequency components contain the true chromatographic peak profiles, while high-frequency components contain random noise. If these five sets of coefficients are first fused into a final one-dimensional time-domain signal through inverse wavelet transform (IDWT) before calculating the error, the errors from these five different frequency bands will be remixed. During backpropagation, the network will find it difficult to clearly distinguish which frequency band has a reconstruction problem. Calculating the error summation directly at five independent wavelet scales forces the network to perform targeted and precise optimization for high-frequency denoising and low-frequency fidelity. Secondly, it shortens the gradient propagation path. The direct output of the network's prediction is the wavelet coefficients; calculating the loss directly here allows the gradient to be immediately propagated back to the convolutional layer to update parameters, resulting in extremely high optimization efficiency. Furthermore, L1 loss is chosen instead of L2 loss because L1 is more robust to outliers and can better protect the sharp edges of asymmetric chromatographic peaks from excessive smoothing.
[0091] Finally, the system uses the backpropagation algorithm combined with an optimizer (such as AdamW with a cosine annealing scheduler) to jointly update the weights of all layers within the network based on the total loss of global summation. It should be understood that the shape and mathematical parameters of the wavelet basis functions are fixed and do not participate in updates during training; the network learns how to accurately remove noise from noisy coefficients by adjusting its internal convolutional parameters.
[0092] The method provided in this invention provides the prerequisite for the model to handle complex non-stationary noise by constructing a paired synthetic dataset that covers an extremely wide signal-to-noise ratio range and highly simulates real asymmetric overlapping peaks. At the same time, it abandons the traditional global time-domain supervision and adopts a strategy of independently calculating and jointly optimizing the L1 reconstruction error on multiple discrete wavelet scales. This not only significantly shortens the gradient path of backpropagation to improve convergence efficiency, but also establishes independent decoupled supervision signals for low-frequency morphology preservation and high-frequency noise stripping at the physical level. This effectively avoids the over-smoothing and peak distortion that are prone to occur in time-domain supervision, thereby training a chromatographic denoising model with extremely high robustness and high-precision quantitative potential.
[0093] Based on any of the above embodiments Figure 7 This is a flowchart illustrating the chromatographic signal denoising method based on a wavelet domain multi-scale convolutional neural network provided by the present invention, as shown below. Figure 7 As shown, the method includes the following steps: First, in the model preparation and training phase: Step S1: Construct a simulated ultra-high performance liquid chromatography dataset.
[0094] Specifically, to overcome the problem of obtaining absolutely noise-free labels in real instrument acquisition, the system randomly superimposes multiple (e.g., 2-6) asymmetric Gaussian peaks using mathematical formulas and adds Gaussian white noise covering a wide signal-to-noise ratio range (e.g., 1-35dB) to generate a large number of pairs of clean and noisy signals, providing high-quality supervised data for deep learning. The dataset is divided into training, validation, and test sets.
[0095] Step S2: Build a multi-scale convolutional neural network model.
[0096] A deep learning architecture comprising multiple independent scale feature encoders, multi-scale feature fusion layers, and multiple independent output head sub-networks is constructed, laying the network structure foundation for a denoising strategy that decouples and processes multi-scale features independently in the wavelet domain.
[0097] Step S3: Perform discrete wavelet transform on the noisy chromatographic signal.
[0098] Specifically, before performing discrete wavelet transform on noisy chromatographic signals, the denoising performance is compared and tested using a test set and the built network model to select wavelet basis functions. Specifically, under a unified model architecture and training strategy, multiple candidate basis functions (such as sym2 to sym11 wavelet bases) from wavelet families like Symlets are evaluated, and the wavelet basis function with the largest signal-to-noise ratio improvement and smallest root mean square error (such as sym11) is selected as the fixed wavelet basis for subsequent physical decomposition and reconstruction.
[0099] Subsequently, the input noisy signal is decomposed into multiple cascades using a selected wavelet basis, physically breaking it down and decoupling it into approximate coefficients characterizing the low-frequency profile and detail coefficients characterizing the high-frequency noise. For example, a discrete wavelet transform with 4 decomposition levels yields one set of approximate coefficients and four sets of detail coefficients.
[0100] Step S4 involves feature extraction, fusion, and reconstruction of wavelet coefficients at each scale.
[0101] Specifically, the scale coefficient components obtained from the above decomposition are input into the network constructed in step S2. The network extracts deep features through independent encoders, and after cross-scale context interaction through the fusion layer, the predicted denoised wavelet coefficients are independently reconstructed by each output sub-network.
[0102] Step S5: Loss function calculation and model parameter update.
[0103] Specifically, the local L1 absolute error between the predicted coefficients and the true pure coefficients is directly calculated in the wavelet domain, and all learnable weights and biases within the network are jointly updated using the backpropagation algorithm. After multiple iterations, the model converges to the optimal state.
[0104] Secondly, in the practical application and reasoning stage: Step S7: Apply the trained model to denoise the real chromatographic signal.
[0105] Specifically, the real, noisy signal acquired by an actual chromatograph (such as an Agilent 1260 Infinity II UHPLC) is input into the system. The system first performs discrete wavelet decomposition (DWT) on the signal, then outputs the denoised prediction coefficients through a trained multi-scale convolutional neural network, and finally performs inverse wavelet transform (IDWT) to restore the time-domain signal.
[0106] Specifically, actual sample data acquired using an Agilent 1260 Infinity II ultra-high performance liquid chromatograph (UHPLC) was used for testing. The samples were fragrance mixtures, with methanol as the solvent. A ZORBAX Eclipse XDB-C18 column was used, with a water (containing 0.1% formic acid) / methanol gradient elution as the mobile phase. The flow rate was 1.0 mL / min, the column temperature was 30℃, the injection volume was 2.0 μL, and the detection wavelength was 290 nm. Chromatographic data were acquired using a Chromeleon 7.3.1 chromatography workstation, with a sampling time range of 0–13.5 min and a sampling frequency of 10 Hz, resulting in 8101 data points. Noise was added to this signal using the same method as when constructing a noisy signal from the dataset, and this noise was used as the input for denoising verification.
[0107] Figure 8 This is a schematic diagram of the actual signal denoising results provided by the present invention, such as... Figure 8 As shown, the red curve represents Original + Noise (i.e., the original, noisy signal), whose baseline exhibits severe spikes and fluctuations, with the effective signal being masked by the noisy background noise. The blue curve represents Denoised, i.e., the signal after denoising using the Multi-Scale Convolutional Neural Network (MSCNN) of this invention. From the magnified view, it is clear that the method provided by this embodiment not only transforms the originally noisy and fluctuating baseline into a smooth and continuous curve, achieving background noise suppression; more importantly, the processed blue curve closely matches the effective peak shape of the original signal, with no significant changes in peak height, peak width, or asymmetry. This indicates that this embodiment, while filtering out high-frequency noise, preserves the low-frequency chromatographic main peak characteristics, avoiding both the pseudo-Gibbs phenomenon (pseudo-peaks) common in traditional wavelet thresholding methods and the excessive smoothing (peak clipping) phenomenon caused by traditional filtering methods.
[0108] The technical effects of the method provided in this embodiment of the invention are as follows: (1) Significantly improves the signal-to-noise ratio over a wide signal-to-noise ratio range, averaging 23.84 dB with a root mean square error of 0.021; (2) The denoising performance is better than traditional methods such as Savitzky-Golay smoothing, fast Fourier transform, empirical mode decomposition and wavelet thresholding; (3) It has good robustness under strong noise and complex peak shape conditions, and reduces peak shape distortion caused by high frequency noise denoising; (4) Applicable to pharmaceutical analysis, environmental monitoring and other fields, providing a reliable pretreatment method.
[0109] The chromatographic signal denoising system provided by the present invention is described below. The chromatographic signal denoising system described below can be referred to in correspondence with the chromatographic signal denoising method described above.
[0110] Figure 9 This is a schematic diagram of the chromatographic signal denoising system provided by the present invention, as shown below. Figure 9 As shown, the system includes: The signal acquisition module 910 is used to acquire the chromatographic signal to be denoised; Wavelet decomposition module 920 is used to perform discrete wavelet decomposition on the chromatographic signal to be denoised, and obtain multiple scale coefficient components at different scales. The multi-scale network denoising module 930 is used to input the coefficient components of each scale into a pre-trained multi-scale convolutional neural network to obtain the prediction coefficients of each scale output by the multi-scale convolutional neural network. The multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted features of each scale, and reconstruct the features of each scale from the fused features to obtain the reconstructed prediction coefficients of each scale. The signal reconstruction module 940 is used to perform wavelet inverse transform on the prediction coefficients at each scale to obtain the denoised chromatographic signal.
[0111] The system provided in this invention, for the chromatographic signal to be denoised, first transforms it to the wavelet domain through discrete wavelet decomposition to obtain multiple scale coefficient components at different scales. Utilizing the multi-resolution analysis characteristics of wavelets, the effective signal and complex noise mixed together are physically separated and decoupled at different frequency scales, thus significantly reducing the difficulty of subsequent network processing. Subsequently, these scale coefficient components at different scales are input into a pre-trained multi-scale convolutional neural network. This network performs independent feature extraction and reconstruction for each scale coefficient component. This independent processing mechanism allows the model to process features correspondingly for different frequency bands; for example, it focuses on preserving the main peak contour in the low-frequency part and removing random noise in the high-frequency part. Simultaneously, the network deeply fuses features at each scale during independent processing, ensuring the coherence of global contextual information and effectively overcoming the artifacts and over-smoothing problems easily caused by severing frequency band connections in traditional denoising methods. Finally, by performing inverse wavelet transform on the reconstructed prediction coefficients at each scale, the denoised chromatographic signal is accurately restored.
[0112] Based on any of the above embodiments, the wavelet decomposition module is specifically used for: The discrete wavelet transform is used to perform multi-level decomposition on the chromatographic signal to be denoised, and the approximation coefficients of the last level decomposition and the detail coefficients generated by each level decomposition are obtained. The approximation coefficients and the detail coefficients generated by each level of decomposition are used as the scale coefficient components of the multiple different scales.
[0113] Based on any of the above embodiments, the multi-scale convolutional neural network includes multiple scale feature encoders, a multi-scale feature fusion layer, and multiple output head sub-networks, wherein the multiple scale feature encoders and the multiple output head sub-networks correspond one-to-one with the multiple scales after discrete wavelet decomposition. The multi-scale network denoising module is specifically used for: Each scale coefficient component is input into the scale feature encoder of the corresponding scale to extract features from each scale coefficient component and obtain the scale coding features corresponding to each scale coefficient component. The extracted coded features at each scale are concatenated, and the concatenated features are fused through the multi-scale feature fusion layer to obtain fused features; The fused features are input into each output head sub-network for feature reconstruction, and each output head sub-network reconstructs initial prediction coefficients of uniform length. Each initial prediction coefficient is compressed in the channel dimension and truncated or padded in the time dimension to restore the coefficient component length to the original corresponding scale, thus obtaining the prediction coefficients for each scale.
[0114] Based on any of the above embodiments, each scale feature encoder consists of two layers of one-dimensional convolution and a ReLU activation function. The parameters of the scale encoders at different scales are independent of each other. The multi-scale feature fusion layer includes three cascaded layers of one-dimensional convolution. The parameters of each output head sub-network are not shared.
[0115] Based on any of the above embodiments, the system further includes a wavelet basis function determination module, which is used for: Under a unified model structure and training strategy, the test set is used to perform denoising tests on multiple candidate wavelet basis functions to obtain the denoised signals corresponding to each candidate wavelet basis function. Based on the denoised signal and the clean signal in the test set, the evaluation index of each candidate wavelet basis function is calculated, and the evaluation index includes the signal-to-noise ratio improvement and the root mean square error. The candidate wavelet basis function that maximizes the signal-to-noise ratio improvement and minimizes the root mean square error is determined as the target wavelet basis function used in the discrete wavelet decomposition and the inverse wavelet transform.
[0116] Based on any of the above embodiments, the system further includes a model training module, the model training module comprising: Dataset building unit, used to build datasets containing both pure chromatographic signals and noisy chromatographic signals; The wavelet decomposition unit is used to perform discrete wavelet decomposition on the noisy chromatographic signal to obtain noisy coefficient components of different scales corresponding to the noisy chromatographic signal, and to perform discrete wavelet decomposition on the pure chromatographic signal to obtain pure coefficient components of different scales corresponding to the pure chromatographic signal. The coefficient reconstruction unit is used to input the noisy coefficient components at each scale into the initial network to obtain the reconstruction prediction coefficients at each scale output by the initial network. The parameter iteration unit is used to perform parameter iteration on the initial network based on the difference between the reconstructed prediction coefficients and the pure coefficient components to obtain the multi-scale convolutional neural network.
[0117] Based on any of the above embodiments, the dataset construction unit is specifically used for: Multiple asymmetric Gaussian peaks are randomly generated, and the generated multiple asymmetric Gaussian peaks are superimposed to obtain the pure chromatographic signal; Gaussian white noise is added to the pure chromatographic signal to generate the noisy chromatographic signal; The dataset is constructed based on the pure chromatographic signal and the noisy chromatographic signal.
[0118] Based on any of the above embodiments, the parameter iteration unit is specifically used for: The errors between the reconstructed prediction coefficients and the pure coefficient components at each scale are calculated separately, and the errors at each scale are summed to obtain the total loss. Based on the total loss, the parameters of the initial network are jointly adjusted, including the learnable weights and biases of the feature encoders at each scale, the multi-scale feature fusion layer, and the output head sub-networks in the initial network.
[0119] Figure 10 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 10 As shown, the electronic device may include: a processor 1010, a communication interface 1020, a memory 1030, and a communication bus 1040, wherein the processor 1010, the communication interface 1020, and the memory 1030 communicate with each other through the communication bus 1040. The processor 1010 can call the logic instructions in the memory 1030 to execute a chromatographic signal denoising method, which includes: acquiring the chromatographic signal to be denoised; performing discrete wavelet decomposition on the chromatographic signal to be denoised to obtain multiple scale coefficient components of different scales; inputting each scale coefficient component into a pre-trained multi-scale convolutional neural network to obtain the scale prediction coefficients output by the multi-scale convolutional neural network, wherein the multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted scale features, and reconstruct the features of each scale on the fused features to obtain the reconstructed scale prediction coefficients; and performing inverse wavelet transform on the scale prediction coefficients to obtain the denoised chromatographic signal.
[0120] Furthermore, the logical instructions in the aforementioned memory 1030 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to related technologies, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0121] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can execute the chromatographic signal denoising method provided by the above methods. The method includes: acquiring a chromatographic signal to be denoised; performing discrete wavelet decomposition on the chromatographic signal to be denoised to obtain multiple scale coefficient components of different scales; inputting each scale coefficient component into a pre-trained multi-scale convolutional neural network to obtain prediction coefficients of each scale output by the multi-scale convolutional neural network, wherein the multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted features of each scale, and reconstruct the features of each scale on the fused features to obtain the reconstructed prediction coefficients of each scale; and performing inverse wavelet transform on the prediction coefficients of each scale to obtain the denoised chromatographic signal.
[0122] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements a chromatographic signal denoising method provided by the above methods. The method includes: acquiring a chromatographic signal to be denoised; performing discrete wavelet decomposition on the chromatographic signal to be denoised to obtain multiple scale coefficient components at different scales; inputting each scale coefficient component into a pre-trained multi-scale convolutional neural network to obtain prediction coefficients at each scale output by the multi-scale convolutional neural network, wherein the multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted features at each scale, and reconstruct the fused features at each scale to obtain the reconstructed prediction coefficients at each scale; and performing inverse wavelet transform on the prediction coefficients at each scale to obtain the denoised chromatographic signal.
[0123] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0124] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the parts that contribute to the related technology, can be embodied in the form of software products. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0125] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for denoising chromatographic signals, characterized in that, include: Obtain the chromatographic signal to be denoised; Discrete wavelet decomposition is performed on the chromatographic signal to be denoised to obtain multiple scale coefficient components at different scales. Each scale coefficient component is input into a pre-trained multi-scale convolutional neural network to obtain the scale prediction coefficients output by the multi-scale convolutional neural network. The multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted scale features, and reconstruct the features at each scale after fusion to obtain the reconstructed scale prediction coefficients. The predicted coefficients at each scale are subjected to inverse wavelet transform to obtain the denoised chromatographic signal.
2. The chromatographic signal denoising method according to claim 1, characterized in that, The discrete wavelet decomposition of the chromatographic signal to be denoised yields multiple scale coefficient components at different scales, including: The discrete wavelet transform is used to perform multi-level decomposition on the chromatographic signal to be denoised, and the approximation coefficients of the last level decomposition and the detail coefficients generated by each level decomposition are obtained. The approximation coefficients and the detail coefficients generated by each level of decomposition are used as the scale coefficient components of the multiple different scales.
3. The chromatographic signal denoising method according to claim 1, characterized in that, The multi-scale convolutional neural network includes multiple scale feature encoders, a multi-scale feature fusion layer, and multiple output head sub-networks, wherein the multiple scale feature encoders and the multiple output head sub-networks correspond one-to-one with the multiple scales after discrete wavelet decomposition. The step of inputting each scale coefficient component into a pre-trained multi-scale convolutional neural network to obtain the scale prediction coefficients output by the multi-scale convolutional neural network includes: Each scale coefficient component is input into the scale feature encoder of the corresponding scale to extract features from each scale coefficient component and obtain the scale coding features corresponding to each scale coefficient component. The extracted coded features at each scale are concatenated, and the concatenated features are fused through the multi-scale feature fusion layer to obtain fused features; The fused features are input into each output head sub-network for feature reconstruction, and each output head sub-network reconstructs initial prediction coefficients of uniform length. Each initial prediction coefficient is compressed in the channel dimension and truncated or padded in the time dimension to restore the coefficient component length to the original corresponding scale, thus obtaining the prediction coefficients for each scale.
4. The chromatographic signal denoising method according to claim 3, characterized in that, Each scale feature encoder consists of two layers of one-dimensional convolution and a ReLU activation function. The parameters of the scale encoders at different scales are independent of each other. The multi-scale feature fusion layer includes three cascaded layers of one-dimensional convolution. The parameters of each output head sub-network are not shared.
5. The chromatographic signal denoising method according to claim 1, characterized in that, The steps for determining the target wavelet basis function used in the discrete wavelet decomposition and the inverse wavelet transform include: Under a unified model structure and training strategy, the test set is used to perform denoising tests on multiple candidate wavelet basis functions to obtain the denoised signals corresponding to each candidate wavelet basis function. Based on the denoised signal and the clean signal in the test set, the evaluation index of each candidate wavelet basis function is calculated, and the evaluation index includes the signal-to-noise ratio improvement and the root mean square error. The candidate wavelet basis function that maximizes the signal-to-noise ratio improvement and minimizes the root mean square error is determined as the target wavelet basis function.
6. The chromatographic signal denoising method according to any one of claims 1 to 5, characterized in that, The training steps of the multi-scale convolutional neural network include: Construct a dataset containing both pure chromatographic signals and noisy chromatographic signals; Discrete wavelet decomposition is performed on the noisy chromatographic signal to obtain the noisy coefficient components at different scales corresponding to the noisy chromatographic signal, and discrete wavelet decomposition is performed on the pure chromatographic signal to obtain the pure coefficient components at different scales corresponding to the pure chromatographic signal. The noisy coefficient components at each scale are input into the initial network to obtain the reconstruction prediction coefficients at each scale output by the initial network. Based on the difference between the reconstructed prediction coefficients and the pure coefficient components, the parameters of the initial network are iterated to obtain the multi-scale convolutional neural network.
7. The chromatographic signal denoising method according to claim 6, characterized in that, The construction of the dataset, which includes both pure and noisy chromatographic signals, includes: Multiple asymmetric Gaussian peaks are randomly generated, and the generated multiple asymmetric Gaussian peaks are superimposed to obtain the pure chromatographic signal; Gaussian white noise is added to the pure chromatographic signal to generate the noisy chromatographic signal; The dataset is constructed based on the pure chromatographic signal and the noisy chromatographic signal.
8. The chromatographic signal denoising method according to claim 6, characterized in that, The step of iterating the parameters of the initial network based on the difference between the reconstructed prediction coefficients and the pure coefficient components includes: The errors between the reconstructed prediction coefficients and the pure coefficient components at each scale are calculated separately, and the errors at each scale are summed to obtain the total loss. Based on the total loss, the parameters of the initial network are jointly adjusted, including the learnable weights and biases of the feature encoders at each scale, the multi-scale feature fusion layer, and the output head sub-networks in the initial network.
9. A chromatographic signal denoising system, characterized in that, include: The signal acquisition module is used to acquire the chromatographic signal to be denoised; The wavelet decomposition module is used to perform discrete wavelet decomposition on the chromatographic signal to be denoised, and obtain multiple scale coefficient components at different scales. A multi-scale network denoising module is used to input the coefficient components of each scale into a pre-trained multi-scale convolutional neural network to obtain the prediction coefficients of each scale output by the multi-scale convolutional neural network. The multi-scale convolutional neural network is used to extract features from each scale coefficient component, fuse the extracted features of each scale, and reconstruct the features of each scale from the fused features to obtain the reconstructed prediction coefficients of each scale. The signal reconstruction module is used to perform wavelet inverse transform on the prediction coefficients at each scale to obtain the denoised chromatographic signal.
10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the chromatographic signal denoising method as described in any one of claims 1 to 8.