A peak recognition method and device based on a deep learning model and a medium
By using a peak identification method based on a deep learning model, the problems of noise misjudgment and low signal-to-noise ratio in spectral peak identification are solved, achieving high accuracy and high efficiency in peak identification, which is applicable to data processing such as gas chromatography, liquid chromatography, and mass spectrometry.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ANHUI WAYEE SCI & TECH CO LTD
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies suffer from noise misjudgment, low processing timeliness and accuracy in spectral peak identification, making it difficult to meet the high real-time and high accuracy requirements of modern analysis, especially when faced with low signal-to-noise ratio and complex waveforms.
A peak identification method based on a deep learning model is adopted. By acquiring and labeling a one-dimensional dataset, a one-dimensional convolutional deep learning model is constructed. Combining training samples and the verification process, the start and end points of peaks are identified, and negative peaks are flipped and raised to enhance the model's noise resistance.
It improves the accuracy and noise resistance of spectral peak identification, can identify complex situations such as peak clusters, has strong applicability, reduces computing power overhead, and improves the robustness and stability of computation.
Smart Images

Figure CN122241241A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing in analytical instruments, and in particular to a peak identification method, apparatus, and medium based on a deep learning model. Background Technology
[0002] In the field of analytical instruments, whether it is the chromatographic waveform output by gas chromatography (GC) or liquid chromatography (LC), or the waveform of mass spectrometry, absorption spectroscopy, or X-ray spectroscopy, peak identification is the core link in data processing, and its identification accuracy directly determines the reliability of the final qualitative and quantitative analysis results.
[0003] In existing technologies, conventional spectral peak determination often relies on differentiation methods, specifically extracting extreme points or inflection points using first or second derivatives. However, this approach has significant drawbacks: when the signal-to-noise ratio is low, additional preprocessing steps such as filtering and resampling are necessary for noise reduction; when dealing with complex waveforms like peak clusters, the determination of negative peaks suffers from logical ambiguity; furthermore, for scenarios with drastic data fluctuations, multiple auxiliary criteria are often required to define the start and end points of peaks. These methods are not only cumbersome and prone to misclassifying noise as spectral peaks (i.e., generating false peaks), but also suffer from low processing timeliness and accuracy, failing to meet the demands of modern analysis for high real-time performance and accuracy.
[0004] In recent years, with the evolution of machine learning technologies such as deep learning, data-driven automatic peak detection schemes have been widely explored. These methods use a large number of spectral waveforms and their corresponding real labels (such as peak position, area, height, and other feature values) as a training set for supervised learning, pre-constructing a peak detection model. In practical applications, the waveform of the spectrum to be tested is input into this model, and the detection result and feature parameters can be directly output through inference calculations, as exemplified by Chinese patents CN111373256B and CN115997219B.
[0005] Chinese patent application CN119959437A discloses a method for detecting negative peaks, which identifies negative peaks by acquiring spectral data and extracting the peak apex, start point, and end point based on the first derivative. This patent differs from the technical approach used in this solution.
[0006] Chinese patent CN111767790B discloses a method for identifying negative peaks in chromatography based on convolutional neural networks. This method employs a convolutional neural network-based approach, performing preprocessing, convolution, downsampling, and fully connected operations on the chromatogram, combined with first and second derivatives, to calculate the characteristic points of the peaks and identify their start, end, and apex. This patent adds auxiliary judgment to the convolutional neural network, increasing the computational workload, which differs from the technical approach used in this solution. Summary of the Invention
[0007] To address the shortcomings of the prior art, the present invention provides a peak identification method, apparatus, and medium based on a deep learning model.
[0008] The technical solution adopted by this invention to solve its technical problem is as follows: A peak identification method based on a deep learning model includes: Obtain the raw data, which is a one-dimensional dataset of data point indices and response values; Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks. Based on the training samples, a one-dimensional convolutional deep learning model is constructed and trained on the data to obtain the training model. Based on the trained model, obtain the basic start and end points and the maximum length of the peak index of the data to be processed; Verify whether the response value of each basic start and end point is a minimum value in the data range with the maximum length of the peak index, and whether the distance between the basic start and end point and the current peak is the shortest. Select the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
[0009] Furthermore, the invalid region is defined as M data points out of N consecutive original data points whose response values all fall within the invalid range; where the specific values of M, N, and the invalid range depend on the data source, and M... <N。
[0010] Furthermore, if negative peaks exist, before or after removing invalid regions, the process includes: selecting positive peak data and flipping it to a negative peak; using the minimum response value of the current data to raise the overall response value of the current data so that the response values of the raised data are all greater than zero, and using the raised data as training samples or data to be processed.
[0011] Furthermore, positive peak data are selected and inverted to become negative peaks, including: Given that the current peak function is f(x), the starting index of the current peak is a, and the ending index of the current peak is b, the equation of the line connecting the starting and ending indices of the current peak is l(x) = f(a) + ((f(b) - f(a)) / (ba)) × (xa); using the equation of the line as the baseline, the flipped peak function is g(x) = 2 × l(x) - f(x); the flipped peak function is the flipped negative peak data.
[0012] Furthermore, the data after the elevation is the sum of the sum of each response value of the current data minus the minimum response value of the current data and the compensation error value; or the sum is squared or logarithmically calculated to obtain the normalized value of the result.
[0013] Furthermore, after removing invalid regions, or after removing invalid regions and processing negative peak data, the process also includes: splicing the remaining data, removing invalid regions between the two peaks of the spliced data, and obtaining augmented data as training samples or data to be processed.
[0014] Furthermore, the deep learning model can be any of the YOLO model, SSD model, or other deep learning models.
[0015] Furthermore, the training model also includes a transformer structure.
[0016] Based on the same inventive concept, this application also proposes a peak recognition device based on a deep learning model, comprising: The data processing module acquires the raw data, which is a one-dimensional dataset of data point indices and response values. Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks. The model building module constructs a one-dimensional convolutional deep learning model based on training samples, performs data training, and obtains the training model. The correction module, based on the trained model, obtains the basic start and end points and the maximum length of the peak index of the data to be processed. Verify whether the response value of each basic start and end point is a minimum value in the data range with the maximum length of the peak index, and whether the distance between the basic start and end point and the current peak is the shortest. Select the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
[0017] Based on the same inventive concept, this application also proposes a computer storage medium storing a computer program, which, when executed by a processor, implements a peak identification method based on a deep learning model as described above.
[0018] The beneficial effects of this invention are reflected in: The peak identification method proposed in this application is a data-driven algorithm. By training a model, it can identify the basic start and end points of peaks, and by combining correction processing, it can obtain accurate peak positions. Because the training model contains a large number of samples, the peak identification method proposed in this application can obtain peak points in complex situations such as peak clusters, demonstrating strong compatibility.
[0019] The peak identification method proposed in this application has strong noise resistance, and when detecting negative peaks, its definition cannot accurately cover multiple scenarios mathematically compared to traditional methods, thus it has strong applicability.
[0020] In the peak identification method proposed in this application, by flipping the positive peak data and raising the negative peak data as a whole, the model can combat high noise data during the learning process, while learning enough negative peak features to improve the accuracy of calculation.
[0021] This application compensates for error values during data upscaling to avoid numerical anomalies such as division by zero overflow, thereby improving the robustness of the calculation while maintaining the original calculation accuracy.
[0022] This application achieves computational stability, accelerates data convergence efficiency, and reduces computational overhead by performing square root or logarithmic operations and normalization.
[0023] This application proposes bidirectional verification of the basic start and end points, and cross-validation to determine the correction start and end points, thereby improving data accuracy. Attached Figure Description
[0024] Figure 1 This is a flowchart illustrating the peak identification method proposed in this application; Figure 2 This is the data detection result of this application; Figure 3 This is the data detection result of a first-order algorithm; Figure 4 This is another data detection result of this application; Figure 5 This is another data detection result from the first-order algorithm. Detailed Implementation
[0025] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.
[0026] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other.
[0027] like Figure 1 As shown, this application proposes a peak identification method based on a deep learning model, as follows: Obtain the raw data, which is a one-dimensional dataset of time points and response values; Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks. Based on the training samples, a one-dimensional convolutional deep learning model is constructed and trained on the data to obtain the training model. Based on the trained model, obtain the basic start and end points and the maximum length of the peak index of the data to be processed; Verify whether the response value of each basic start and end point is a minimum value in the data range with the maximum length of the peak index, and whether the distance between the basic start and end point and the current peak is the shortest. Select the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
[0028] In this application, the deep learning model can be the YOLO model, the SSD (Single Shot MultiBox Detector) model, or other models.
[0029] The peak identification process in this application is as follows: Obtain the raw data, which is a one-dimensional dataset of data point indices and response values.
[0030] Preferably, the source of the raw data can be chromatographic data output from gas chromatography or liquid chromatography, mass spectrometry data, or other spectral data.
[0031] Depending on the data source, the composition of its one-dimensional data varies, which is specifically reflected in the different content represented by the index.
[0032] The data objects processed in this application are one-dimensional data, including output time points and response values, with a one-to-one correspondence between time points and response values. Alternatively, the one-dimensional data can be wavelengths and response values, with a one-to-one correspondence between wavelengths and response values.
[0033] Because the data sources are different, their indices are different. They can be mass-to-charge ratio or other content (such as time point, wavelength, etc.), but the values corresponding to the indices are all response values of signal strength.
[0034] The response value depends on the type of detector. For a UV detector, the ordinate is absorbance; for a mass spectrometer detector, the ordinate is ion abundance (or ion intensity).
[0035] Preferably, it can also be a fluorescence detector, an electrochemical detector, or other detectors with different signal intensities.
[0036] Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks.
[0037] The invalid region is defined as M data points out of N consecutive original data points whose response values all fall within the invalid range. The specific values of M, N, and the invalid range depend on the data source. <N。
[0038] Preferably, either manual or automatic annotation can be used. This application does not limit the specific method of data annotation; any annotation method can be used to obtain the peak type and peak start and end index of the original data.
[0039] The start and end indexes of a peak are the start and end indices of the current peak.
[0040] The start and end indexes of the peaks can be determined through extreme value calculations, or other calculation methods can be used.
[0041] Among them, the peak type can be defined as a positive peak or a negative peak by calculating the slope of the data between the peak's start point and end point and the peak's apex, and by determining the sign of the two slopes.
[0042] If the slope between the peak start point and the peak apex is positive, and the slope between the peak end point and the peak apex is negative, then the current peak is defined as a positive peak; if the slope between the peak start point and the peak apex is negative, and the slope between the peak end point and the peak apex is positive, then the current peak is defined as a negative peak; or the definitions of positive and negative peaks can be interchanged.
[0043] Alternatively, other calculation methods can be used to determine the peak type.
[0044] In addition to determining the peak start and end indexes and peak type through calculation, as mentioned above, you can also manually annotate and define the content.
[0045] Alternatively, calculation can be performed first, followed by manual annotation, to reduce calculation errors.
[0046] The annotation format is "peak type, start index, end index", or "start index, end index, peak type", or "peak type, end index, start index", or other formats, which are not limited in this application.
[0047] In addition, the representation of peak types can be defined, such as defining positive peaks as 0 and negative peaks as 1. Or other numerical or symbolic representations can be used, which are not limited here.
[0048] For example, "0, 100, 200" means that the starting index of the current peak is 100, the ending index is 200, and the current peak is a positive peak.
[0049] Depending on the data source, the composition of its one-dimensional data varies, which is specifically reflected in the different content represented by the index.
[0050] The data point index is any one of the following, or other content, such as the data wavelength, time point, mass-to-charge ratio, etc. in one-dimensional data, which corresponds to the response value.
[0051] Depending on the data source—whether it's chromatographic data, mass spectrometric data, or other types of data—the peak positions (i.e., the start and end points of the peaks), peak values, and the spacing between peaks will differ. Furthermore, even within the same type of data, these characteristics can vary depending on the detection target. Therefore, this application cannot provide specific values corresponding to the invalid range.
[0052] Taking chromatographic data as an example, if the response values of 300 consecutive data points all fall within the invalid range, that is, the maximum value of the response values of the 300 data points is less than the first invalid value, and the minimum value of the response values of the 300 data points is greater than the second invalid value.
[0053] The invalid range is defined as [second invalid value, first invalid value], where the second invalid value is less than or equal to the first invalid value. The difference between the first and second invalid values depends on the object being detected. Furthermore, the difference between the first and second invalid values within the invalid range is less than a difference threshold to avoid misjudging low response value peaks as falling into the invalid range.
[0054] For example, the 200 data points in the middle of 300 data points are considered invalid regions, i.e., N=300, M=200.
[0055] Preferably, this application uses the M data points in the middle of the invalid range as the invalid region to avoid accidental deletion of the peak start or peak end when deleting the invalid region.
[0056] Preferably, data in other locations within the invalid range can also be set as invalid regions.
[0057] For example, define the 70th to 150th data points or the 100th to 260th data points as invalid regions out of 300 data points. The specific location and number (M) of invalid data points are determined based on the actual situation.
[0058] The 300 and 200 data points mentioned here are just examples; they can also be set to 400, 500, or other values.
[0059] At this point, the one-dimensional data after removing invalid regions can be used as training samples or data to be processed.
[0060] Preferably, if there is a large difference in data height after removing invalid regions, the data after removing invalid regions can be squared or logarithmically calculated, and the normalized value of the calculation result can be used as a training sample or data to be processed.
[0061] Due to differences in data sources, negative peaks may exist. Therefore, it is necessary to process negative peak data to increase the accuracy of data processing.
[0062] The processing of negative peak data can be done before or after removing invalid regions.
[0063] If a negative peak exists, before or after removing invalid regions, the process includes: selecting positive peak data and flipping it to a negative peak; using the minimum response value of the current data to raise the overall response value of the current data so that the response values of the raised data are all greater than zero, and using the raised data as training samples or data to be processed.
[0064] Specifically, by subtracting the minimum response value of the current data from each response value in the current data and compensating for the error value, the overall improvement of the current data is achieved, so that the response values of the improved data are all greater than zero.
[0065] Current data includes both acquired data of reversed negative peaks and data of unreversed negative peaks.
[0066] Since there is no specific relationship between negative peak data processing and invalid region removal, the data that has not reversed negative peaks are either the data that has not reversed negative peaks in the remaining data after the invalid regions have been removed, or the data that has not reversed negative peaks in the original data before the invalid regions were removed.
[0067] Depending on the order of the data processing, the current data includes the acquired data with reversed negative peaks, the data in the remaining data after removing invalid regions that have not reversed negative peaks, or the acquired data with reversed negative peaks and the data in the original data that have not reversed negative peaks.
[0068] The data after the lift is the sum of the sum of the minimum response value of the current data minus each response value in the current data, plus the compensation error value; or the sum is obtained by taking the square root or performing a logarithmic operation on the sum, and the result is the normalized value.
[0069] The specific processing procedure for negative peak data is as follows: In order to provide deep learning models with enough training data featuring negative peaks, this application proposes to flip some positive peak data into negative peaks to expand the number of negative peaks.
[0070] Given that the current peak function is f(x), the starting index of the current peak is a, and the ending index of the current peak is b, then the equation of the line connecting the starting and ending indices of the current peak is l(x) = f(a) + ((f(b) - f(a)) / (ba)) × (xa); using this equation as the baseline, the flipped peak function is g(x) = 2 × l(x) - f(x). The flipped peak function represents the flipped negative peak data.
[0071] Since one-dimensional data lacks rotation invariance, the data labels are changed after rotating the peak. Since stretching the peak would result in an excessive increase in data volume, it is preferable to use zero padding to unify the data length.
[0072] After the negative peak flips, it also includes: The minimum response value of the current data is subtracted from each response value of the acquired inverted negative peak data and the original data, and then summed with the error value. The sum is used as a training sample or data to be processed.
[0073] In this application, the current data is boosted by subtracting the minimum response value of the current data, so that the response values of the current data are all greater than zero. This enables the model to combat high-noise data during the learning process, while learning enough negative peak features to improve the accuracy of the calculation.
[0074] In this application, the error value is generally a very small value, such as 0.000001, in order to avoid numerical singularities such as division by zero overflow, thereby improving the robustness of the calculation while maintaining the original calculation accuracy.
[0075] In addition, due to differences in data sources, there may be issues with the response value being too high after negative peak expansion.
[0076] Due to the differences in data dimensions, the specific methods of data augmentation are also quite different. First, the input data cannot be directly normalized because of the large differences in height. Generally, it is necessary to take the square root or logarithm first, and then normalize the data.
[0077] Preferably, after overall upscaling the data after the negative peak inversion, the method further includes: Perform square root or logarithmic operations on the summed values, and use the normalized values of the results as training samples or data to be processed.
[0078] This application rationalizes the expanded response value by adding square root or logarithm operations after negative peak conversion and performing normalization processing, thereby making the data calculation after negative peak expansion more stable, accelerating the efficiency of data convergence, and reducing computing power consumption.
[0079] To accelerate data processing efficiency, preferably, after completing data processing, the method further includes: splicing the remaining data, removing invalid regions between the two peaks of the spliced data, and obtaining augmented data as training samples or data to be processed.
[0080] Preferably, data processing is performed by removing invalid regions and negative peaks; if no negative peaks exist, data processing is performed by removing invalid regions.
[0081] The specific processing procedure is as follows: In the one-dimensional data (i.e., the remaining data) after removing invalid regions, select the peak data to be spliced, calculate the difference between the peak end index in the first spliced data and the peak start index in the second spliced data. If the difference is greater than N, determine whether the response value of the one-dimensional data between the two spliced data (first spliced data and second spliced data) falls into the invalid range, and identify the invalid region; remove the invalid region between the spliced data, and obtain the augmented data as training samples or data to be processed.
[0082] If the difference is less than or equal to N, then the first concatenated data and the second concatenated data are directly concatenated, and the concatenated data is used as the training sample or the data to be processed.
[0083] Preferably, if negative peaks exist, the remaining data also includes one-dimensional data after removing invalid regions and completing the negative peak data processing.
[0084] Based on the training samples, a one-dimensional convolutional deep learning model is constructed and trained on the data to obtain the training model.
[0085] In this application, the deep learning model is any one of the YOLO model, SSD model, or other deep learning models.
[0086] Preferably, the training model includes one-dimensional convolution.
[0087] Preferably, taking YOLO as an example of a deep learning model, YOLO is modified to adapt it to one-dimensional sequences.
[0088] This application takes the modified YOLOv3 network graph as an example, and modifies its convolutional module to a one-dimensional convolutional module (i.e., the Conv 1d module in the figure) to obtain a one-dimensional sequence feature extraction framework.
[0089] In this application, one-dimensional data (i.e., original data) is processed by ConvModule (convolution module) to extract local features of the data, and the stride (Stride, s=2) is controlled to achieve downsampling (spatial dimensionality reduction). Combined with the stacking of ResBlocks (residual blocks), more complex and abstract features are learned.
[0090] Based on the YOLOv3 framework, the combination of "ConvModule + ResBlock" is repeated multiple times to form a multi-level feature pyramid of the network. Each "convolution + residual" pair constitutes a feature extraction stage, allowing the YOLO framework to repeatedly extract and learn features to improve the noise resistance of the trained model.
[0091] The ConvModule encapsulates the following three layers of operations in sequence: The convolutional layer (Conv1d) performs core feature extraction (linear transformation); the BatchNorm layer is used to "normalize" the chaotic distribution of the convolutional layer output, making it more stable and easier for subsequent processing; the LeakyReLU layer applies non-linearity to the normalized data, allowing the network to learn more complex relationships, while improving the expressive power and training effect of the deep learning model by preserving a small gradient. LeakyReLU involves almost no complex exponential or multiplication / division operations, making it computationally fast.
[0092] Preferably, the YOLOv3 framework of this application includes a combination of ConvModules (ConBlock and ResBlock).
[0093] ConBlock contains five convolutional modules. By alternating the kernel size (k) and padding (p) of the convolutional modules, the integration of channel information and spatial feature extraction are decoupled. This reduces computational costs while increasing network depth and nonlinearity, enabling the deep learning model to perform refined feature learning and improve model performance.
[0094] ResBlock contains two convolutional modules, which have low computational cost, increase network depth and non-linearity, and precisely control channel dimensions.
[0095] The one-dimensional sequence feature extraction framework proposed in this application only needs to regress the length instead of the original box when regressing coordinates during the final detection process. This method is an end-to-end approach, directly outputting the start and end points of the peaks. Its detection object is chromatographic data (one-dimensional sequence), rather than two-dimensional chromatogram.
[0096] Preferably, the training model also includes a transformer structure to optimize the training model and improve computational accuracy.
[0097] Preferably, a deep learning model using only one-dimensional convolutions can be used, without the transformer structure.
[0098] Based on the trained model, obtain the basic start and end points of the data to be processed and the maximum length of the peak index.
[0099] Preferably, the trained model can output the basic start and end points of the data to be processed. Based on the start and end points, the index length between each peak can be determined, and thus the maximum peak index length can be determined.
[0100] Verify whether each basic start and end point is the minimum value of the response value in the data range with the maximum length of the peak index, and whether the basic start and end point is the shortest distance from the current peak. Select the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
[0101] Preferably, the distance between any base starting point and the current peak is an absolute value, without considering directionality, so that the distance with the smallest value is selected.
[0102] The coordinate points (i.e., the basic start and end points) based on the regression of the training model generally have point bias. This application proposes to correct the point bias by using the maximum length of the peak index, as follows: Based on the maximum length L of the detected peak index, each peak is judged. If the current peak is to the left of any basic start and end point, the peak index corresponding to the minimum response value closest to the peak vertex within the interval (the basic start point of the current peak - L) and the peak vertex of the current peak is determined and used as the correction start point of the current peak.
[0103] Furthermore, if the current peak is to the right of any basic starting point or ending point, determine the peak index corresponding to the minimum response value closest to the peak vertex within the interval between (the basic starting point of the current peak + L) and the peak vertex of the current peak, and use it as the correction endpoint of the current peak.
[0104] In the above correction process, the baseline is assumed to be a vertical line.
[0105] Preferably, the minimum response value closest to the peak within the interval in this application can be determined by comparing the response values one by one, by calculating the derivative, or by other methods. This application does not impose any limitations on this.
[0106] This application proposes bidirectional verification of the basic start and end points, and cross-validation to determine the correction start and end points, thereby improving data accuracy.
[0107] The peak identification method provided in this application is a data-driven algorithm. By training the model, the basic start and end points of the peak cluster can be obtained. Combined with the correction processing of the peak start point, the accurate peak position can be obtained.
[0108] The peak identification method proposed in this application has strong noise resistance and, compared with traditional methods, has significant advantages in detecting negative peaks because the definition of negative peaks cannot accurately cover multiple scenarios mathematically.
[0109] The detection results of the first-order algorithm presented in this application are compared with the detection results of the peak recognition method in this application.
[0110] In this application, the first-order algorithm first employs Savitzky-Golay filtering, a smoothing algorithm based on local least squares polynomial fitting. A low-order polynomial is used to fit the data within a sliding window, and the value of the polynomial at the center point of the window is used as the output. This method effectively preserves the signal's shape, peak values, and other characteristics while denoising. Specifically, the sliding window width is 11 data points (an odd number), meaning that each time, 5 points before and after the current point are selected for first-order polynomial (linear) fitting. Then, the first derivative of the data is calculated to determine the vertex and start / endpoint positions.
[0111] like Figure 2-5 As shown, this application provides two sets of data. Figure 2 and Figure 3 For a set of data, Figure 4 and Figure 5 For another set of data, Figure 2 , Figure 4 The results of the peak identification method in this application are as follows. Figure 3 , Figure 5 This is the detection result of the first-order algorithm.
[0112] The horizontal axis of the two sets of data provided in this application is the data index value (the data here is a data collection after data processing and splicing. In order to facilitate subsequent data processing, the horizontal axis is a manually defined data index value to indicate the data order and has no other meaning).
[0113] Figure 2 and Figure 3 The response value on the ordinate is the emitted light intensity, in mV.
[0114] Figure 4 and Figure 5 The vertical axis represents the absorbance, measured in mA.
[0115] contrast Figure 2 and Figure 3 It can be clearly seen that the peak identification method proposed in this application can clearly identify the start and end points of small peaks. Compared with the traditional first-order algorithm that uses the same point as the end point of the previous peak and the start point of the current peak, the peak identification method proposed in this application has higher accuracy.
[0116] contrast Figure 4 and Figure 5 It is evident that the peak identification method proposed in this application can accurately identify the start and end points of peaks, clearly distinguish the start and end points from noise, and has anti-interference capabilities.
[0117] Based on the same inventive concept, this application also proposes a peak recognition device based on a deep learning model, comprising: The data processing module acquires the raw data, which is a one-dimensional dataset of data point indices and response values. Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks. The model building module constructs a one-dimensional convolutional deep learning model based on training samples, performs data training, and obtains the training model. The correction module, based on the training model, obtains the basic start and end points and the maximum length of the peak index of the data to be processed; verifies whether the response value of each basic start and end point is a minimum value in the data range of the maximum length of the peak index, and whether the basic start and end point is the shortest distance from the current peak. It then selects the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
[0118] Based on the same inventive concept, this application also proposes a computer storage medium storing a computer program, which, when executed by a processor, implements a peak identification method based on a deep learning model as described above.
[0119] The above description is merely a preferred embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the inventive concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this application.
[0120] Apart from the technical features described in the specification, the other technical features are known to those skilled in the art. To highlight the innovative features of this invention, the other technical features will not be described in detail here.
Claims
1. A peak identification method based on a deep learning model, characterized in that, include: Obtain the raw data, which is a one-dimensional dataset of data point indices and response values; Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks. Based on the training samples, a one-dimensional convolutional deep learning model is constructed and trained on the data to obtain the training model. Based on the trained model, obtain the basic start and end points and the maximum length of the peak index of the data to be processed; Verify whether the response value of each basic start and end point is a minimum value in the data range with the maximum length of the peak index, and whether the distance between the basic start and end point and the current peak is the shortest. Select the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
2. The peak identification method of claim 1, wherein, The invalid region is defined as M data points out of N consecutive original data points whose response values all fall within the invalid range; where M, N, and the specific values of the invalid range depend on the data source, and M... <N。 3. The peak identification method according to claim 1 or 2, characterized in that, If a negative peak exists, before or after removing invalid regions, the process includes: selecting positive peak data and flipping it to a negative peak; using the minimum response value of the current data to raise the overall response value of the current data so that the response values of the raised data are all greater than zero, and using the raised data as training samples or data to be processed.
4. The peak identification method of claim 3, wherein, Select positive peak data and invert them to form negative peaks, including: Given that the current peak function is f(x), the starting index of the current peak is a, and the ending index of the current peak is b, the equation of the line connecting the starting and ending indices of the current peak is l(x) = f(a) + ((f(b) - f(a)) / (ba)) × (xa); using the equation of the line as the baseline, the flipped peak function is g(x) = 2 × l(x) - f(x); the flipped peak function is the flipped negative peak data.
5. The peak identification method of claim 3, wherein, The data after the lift is the sum of the sum of each response value of the current data minus the minimum response value of the current data and the compensation error value; or the sum is square rooted or logarithmically calculated and the result is normalized.
6. The peak identification method of claim 2 or 5, wherein, After removing invalid regions, or after removing invalid regions and processing negative peak data, the process also includes: splicing the remaining data, removing invalid regions between the two peaks of the spliced data, and obtaining augmented data as training samples or data to be processed.
7. The peak identification method of claim 1, wherein, The deep learning model can be any of the YOLO model, SSD model, or other deep learning models.
8. The peak identification method of claim 7, wherein, The training model also includes a transformer structure. 9.A peak recognition apparatus based on a deep learning model, characterized by, include: The data processing module acquires the raw data, which is a one-dimensional dataset of data point indices and response values. Label the peak types and peak start and end indexes of the original data. Based on the labeling content and data source, remove invalid regions and obtain training samples or data to be processed. The peak types are positive peaks and negative peaks. The model building module constructs a one-dimensional convolutional deep learning model based on training samples, performs data training, and obtains the training model. The correction module, based on the trained model, obtains the basic start and end points and the maximum length of the peak index of the data to be processed. Verify whether the response value of each basic start and end point is a minimum value in the data range with the maximum length of the peak index, and whether the distance between the basic start and end point and the current peak is the shortest. Select the index corresponding to the minimum value that meets the verification requirements as the correction start and end point.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the peak identification method as described in any one of claims 1-8.
Citation Information
Patent Citations
Waveform analysis device
CN111373256B
A chromatographic peak recognition method based on convolutional neural network
CN111767790B
Data generation method and device, and recognizer generation method and device
CN115997219B
Negative peak detection method and device and readable storage medium
CN119959437A