A terahertz spectroscopy detection method for new drugs

By employing terahertz time-domain spectroscopy and data processing methods, the problems of low sensitivity and insufficient accuracy in the detection of new drugs have been solved, enabling rapid and accurate drug identification that is suitable for on-site testing.

CN122193145APending Publication Date: 2026-06-12ZHENJIANG PUBLIC SECURITY BUREAU

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHENJIANG PUBLIC SECURITY BUREAU
Filing Date
2026-02-02
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing new drug detection technologies suffer from problems such as low sensitivity, insufficient accuracy, inability to achieve real-time on-site detection, susceptibility to interference, and inability to provide molecular structure information.

Method used

By employing terahertz time-domain spectroscopy, drug samples are precisely prepared and terahertz spectral detection is performed. Combined with data processing and characteristic spectral screening, rapid qualitative identification of new drugs can be achieved.

Benefits of technology

It enables rapid, accurate, and non-destructive detection of new drugs, allowing for the on-site identification of various new drugs without the need for complex chemical pretreatment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122193145A_ABST
    Figure CN122193145A_ABST
Patent Text Reader

Abstract

This invention discloses a terahertz spectroscopy detection method for novel drugs. First, a thin film sample of the drug is prepared, and a gradient spin-coating speed of 0-2500 rpm is set to analyze the influence of sample uniformity on the terahertz spectrum. Multi-dimensional spectral data in the 0.1-2 THz frequency band are collected using a terahertz spectroscopy detection device to clarify key detection parameters and data ranges. Data analysis shows that in the low-frequency band (0.1-0.5 THz), optical parameters such as sample absorbance and dielectric loss are sensitive to spin-coating speed. As the speed increases (up to 2500 rpm), the dispersion of sample molecular aggregates increases, the scattering effect weakens, and the characteristic peak amplitude decreases (low-frequency absorbance at 2500 rpm decreases by approximately 14% compared to the baseline value). In the mid-to-high frequency band (0.5-2 THz), the sample absorption characteristics are dominated by intrinsic molecular vibrations, and the spin-coating speed has a weak effect on the position of the characteristic peaks. This method enables rapid detection of both traditional and novel drugs, featuring high accuracy and sensitivity, and a wide range of applications, providing a powerful tool for combating criminals.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of drug detection technology, specifically a terahertz spectroscopy detection method for novel drugs. Background Technology

[0002] In recent years, new types of drugs have emerged, characterized by their diversity, concealed forms, and rapid iteration of synthetic processes. Their main components are mostly synthetic psychoactive substances, such as fentanyl analogues, synthetic cannabinoids, and cathinones. These drugs are characterized by their high addictiveness, severe toxicity, and serious social harm.

[0003] The mainstream detection technologies for new drugs mainly include chromatography-mass spectrometry, immunochromatography, and Raman spectroscopy, but these technologies have obvious limitations in practical applications.

[0004] While chromatography-mass spectrometry (GC-MS) is sensitive and accurate, it requires complex sample pretreatment (such as extraction and derivatization) and relies on large laboratory equipment, making it impossible to achieve real-time on-site detection. Immunochromatography can quickly screen on-site, but it is prone to cross-reactions with structurally similar drug analogues, leading to result deviations, and it can only provide semi-quantitative results without providing molecular structural information. Raman spectroscopy relies on molecular vibrational fingerprint recognition, but it is affected by scattering interference from dark, high-fluorescence, or powder samples, resulting in low signal-to-noise ratios of characteristic peaks and a tendency to decrease detection accuracy.

[0005] Terahertz (THz) waves, electromagnetic waves with frequencies between 0.1 and 10 THz, possess unique molecular fingerprinting characteristics. Most polar molecules (including many novel drug components) exhibit specific vibrational / rotational energy level transitions in the terahertz band, corresponding to characteristic absorption peaks, enabling precise qualitative analysis of substances. Simultaneously, terahertz waves have extremely low energy (only on the meV scale), causing no ionizing damage to biological tissues, thus offering the advantage of non-destructive detection. Furthermore, terahertz time-domain spectroscopy (THz-TDS) technology can simultaneously acquire amplitude and phase information of samples, further enhancing detection accuracy and anti-interference capabilities by combining multi-dimensional spectral data. Summary of the Invention

[0006] The purpose of this invention is to provide a terahertz spectral detection method for novel drugs, in order to solve the problems mentioned in the background art.

[0007] To achieve the above objectives, the present invention provides the following technical solution: a terahertz spectral detection method for novel drugs, comprising the following three stages:

[0008] The first stage, drug sample preparation, includes the following steps:

[0009] Step 1: Select a 1cm×1cm×0.5cm high-purity quartz glass slide as the substrate material. Clean it with acetone and anhydrous ethanol for 10 minutes each.

[0010] Step 2: Dry the surface of the glass slide with a constant temperature electric heating plate, and finally place it in a 120℃ oven for 30 minutes to ensure that the surface of the substrate is clean and free of contamination.

[0011] Step 3: Under constant temperature and humidity (23±1℃, RH40±5%), accurately transfer 15μL of drug sample solution using a calibrated 20μL micropipette, and slowly drop the solution onto the center of the pretreated quartz substrate using the hanging drop method.

[0012] Step 4: Immediately place the substrate of the coated sample onto the vacuum chuck of the spin coater and set the gradient rotation speed parameter to 500 rpm.

[0013] As a further embodiment of the present invention: the drug molecule is a methanol solution of metoprolol, isopramidine, etomidate, metoprolol and methyl fluoroacetate.

[0014] As a further embodiment of the present invention: the drug sample is a methanol solution of metoprolol, isopramipex, etomidate, metoprolol or methyl fluoroacetate, with the concentration controlled in the range of 1-10 mg / mL, so as to ensure the signal response intensity when the terahertz pulse interacts with the sample.

[0015] As a further embodiment of the present invention: in step 3, the drug sample solution needs to be filtered through a 0.22μm organic filter membrane before being transferred to remove particulate impurities in the solution and avoid impurities interfering with the terahertz spectral signal.

[0016] As a further embodiment of the present invention: in step 4, the spin coating acceleration of the spin coater is set to 0, 500, 1000, 1500, 2000, and 2500 rpm, and the duration of the rotation speed gradient is 30 s. The 0 rpm group serves as a control group and does not undergo spin coating. This allows the sample solution to form a uniform thin film on the substrate surface, improving the uniformity of terahertz pulse transmission / reflection detection.

[0017] In the second stage, the prepared drug sample is subjected to terahertz time-domain spectroscopy detection using a terahertz time-domain spectroscopy detection device. The terahertz time-domain spectroscopy detection device includes, in sequence: an ultrafast pulse laser 1, a beam splitter 2, a chopper 3, a time delay device 4, a lens 5, a terahertz radiation source 6, an off-axis parabolic mirror 7, a sample 8, a polarizer 9, a transmission detection device 10, a reflection detection device 11, a terahertz signal detection unit 12, a λ / 4 waveplate 13, a Neraston prism 14, a photodiode 15, a lock-in amplifier 16, and a chopper driver 17.

[0018] The testing process is as follows:

[0019] Step 1: Start the ultrafast pulse laser (1) to output a laser beam with a pulse width ≤ 100fs, which is then split into pump light and probe light by a beam splitter (2);

[0020] Step 2: Initialize system parameters, set the laser repetition frequency to 1-100kHz, adjust the relative position of the terahertz radiation source (6) and the transmission detection device (10) to ensure that the optical path coaxiality error is ≤0.1mm.

[0021] Step 3: Guide the pump light to be incident on the terahertz radiation source (6), and generate terahertz pulses through photoconductive effect or optical rectification effect. Adjust the pump light power to 50-500mW so that the terahertz pulse bandwidth covers 0.1-3THz.

[0022] Step 4: Use the time delay device (4) to adjust the optical path of the probe light so that the probe light and the terahertz pulse are synchronized in time at the terahertz signal detection unit (12), and the synchronization accuracy is controlled within ±50fs.

[0023] Step 5: Acquire the terahertz reference pulse signal in the sample-free state and record its time-domain waveform (including amplitude, phase and pulse width information) as the reference for subsequent data processing.

[0024] Step 6: Fix the sample (8) to be tested in the terahertz pulse transmission optical path, and use the transmission mode or the reflection mode to realize the interaction between the terahertz pulse and the sample (8). The transmission signal is obtained by the transmission detection device (10), and the reflection signal is obtained by the reflection detection device (11).

[0025] Step 7: Drive the time delay device (4) to perform linear scanning (scanning range 0-10ps, step size ≤10fs) and synchronously acquire the time domain signal of the terahertz pulse modulated by the sample;

[0026] Step 8: Perform multiple measurements on the same sample (≥3 times) to obtain the average value of the time domain signal, reduce random noise interference, and control the measurement time of each measurement within the range of 1-10 seconds.

[0027] Among them, the ultrafast pulse laser (1) mentioned in step 1 is a Ti:sapphire femtosecond laser with a center wavelength of 780-820nm and a laser beam spot diameter controlled at 1-3mm to ensure the energy stability of the pump light and probe light after subsequent beam splitting.

[0028] In step 2, when initializing system parameters, the ambient temperature should be adjusted to 23±2℃ and the relative humidity to 40%-60%, and the terahertz radiation source (6) and the transmission detection device (10) should be fixed by the anti-vibration platform to avoid the optical path coaxiality deviation from the set range caused by environmental vibration.

[0029] In step 3, the terahertz radiation source (6) uses a gallium arsenide photoconductive antenna. The angle of the pump light incident on the photoconductive antenna is adjusted to 45±5°. A neutral density filter is added to the pump light path to achieve step-wise adjustment of the pump light power and ensure that the terahertz pulse bandwidth stably covers 0.1-3THz.

[0030] The gallium arsenide photoconductive antenna has an electrode gap of 5-20 μm and an operating bias voltage of 5-30 V. By adjusting the bias voltage, the carrier acceleration efficiency is optimized, which increases the peak power of the terahertz pulse by 10%-20% and enhances the signal-to-noise ratio of subsequent signal detection.

[0031] In step 6, when fixing the sample to be tested, a quartz sample cell is used to hold the drug sample. The thickness of the sample cell is 0.5-2mm, and the surface of the sample cell is provided with an anti-reflection coating to reduce the reflection loss of terahertz pulses at the sample cell interface and improve signal acquisition efficiency.

[0032] In step 8, when performing multiple measurements on the same sample, a cyclic shift measurement method is adopted. Before each measurement, the sample is shifted 0.5-1mm along the optical path axis to avoid interference from local impurities or inhomogeneities of the sample on the measurement results, and further reduce the influence of random noise on the average value of the time domain signal.

[0033] The third stage involves data processing of the terahertz spectra of the tested samples, as detailed below:

[0034] First, the terahertz spectral data of the detected samples are preprocessed. Preprocessing methods include one or more of the following: standard normal variable transformation, multivariate scattering correction, Savitzky-Golay convolution smoothing, and wavelet transform. The frequency domain spectral data (absorbance or refractive index as a function of frequency sequence) acquired by the terahertz time-domain spectral system and obtained through Fourier transform is imported into data processing software (such as Origin, MATLAB, or Python). In this embodiment, the Savitzky-Golay convolution smoothing method is used for preprocessing. Specifically, for each data point in the spectral data sequence (e.g., the i-th point), m points before and after it are taken (window width W = 2m + 1, in this example W = 9, so m = 4) to form a local window. Within this window, a first-order cubic polynomial (i.e., the form y = a0 + a1x + a2x) is used. 2 +a3x 3The process involves performing a least-squares fit on the original data points within the window. The fitted polynomial is then calculated at the center point of the window (the i-th point). This fitted value replaces the original, potentially noisy, value of the i-th data point. The window is then moved one point forward (from the i-th point to the (i+1)-th point). This process is repeated until the entire spectral data sequence has been traversed. After this smoothing process, the result is as follows: Figures 2 to 6 As shown, high-frequency random spikes in the spectral curves were effectively suppressed, making the curves smoother. Meanwhile, the characteristic absorption peaks located at 0.3THz, 0.8THz, and 1.2THz were clearly preserved, and the peak positions and shapes were not significantly distorted, laying a reliable data foundation for subsequent feature extraction and similarity calculation.

[0035] Secondly, the optimal feature spectra are selected through screening. The preferred screening methods include one or more of the following: continuous projection algorithm, minimum angle regression algorithm, competitive adaptive reweighted sampling algorithm, and vector machine regression algorithm. This embodiment of the invention preferably uses the vector machine regression algorithm. Specifically, the method involves using preprocessed full-band terahertz spectral data containing standard samples of various known drugs (metoprolol, isopramipexole, etc.) as the training set. Each spectrum is a high-dimensional vector, with its dimension equal to the number of sampling points. A label is assigned to each spectrum (e.g., a label value of "1" is assigned to metoprolol, and different label values ​​are assigned to other substances or background). A radial basis function (RBF) is selected as the kernel function of the SVR model, and the model parameters (such as the penalty factor C and the kernel width γ) are initialized. The initial SVR model is trained using data from all spectral bands. This model aims to learn the complex mapping relationship between spectral features and substance labels. Assessing Feature Importance: After training, the SVR model (especially RBF-based models) does not directly provide feature coefficients, but the importance of each frequency point (i.e., each spectral dimension) can be assessed through its support vectors or derived metrics based on model weights in the feature space (such as approximation using a linear kernel or calculating the contribution of each feature to the gradient of the decision function). Sort and Eliminate: Sort all spectral features (frequency points) in ascending order based on their assessed importance scores. Remove the feature or group of features with the lowest importance score (e.g., remove the lowest 10% of features each time). Reconstruct Dataset and Retrain: Reconstruct the training dataset using the remaining feature spectral subset and retrain a new SVR model based on it. Recursive Loop: Repeat the "evaluate-sort-eliminate-retrain" process, gradually reducing the size of the feature spectral subset with each iteration. Throughout the recursive elimination process, continuously monitor the SVR model's performance on an independent validation set (e.g., prediction correlation coefficient R², root mean square error RMSE). Stop the recursion when the SVR model's performance on the validation set reaches its peak or begins to decline. The subset of characteristic spectra retained at this point represents the optimal combination of characteristic spectra while ensuring the model's best generalization ability. For example, after screening, several narrow peaks located at 0.32 THz, 0.85 THz, and 1.21 THz were ultimately identified as the most critical characteristic spectra for identifying "metoprolol". The selected key characteristic frequency points are then mapped back to the original spectrum (e.g., ...). Figure 2-6 As shown in the figure, these points should precisely correspond to the significant absorption peaks marked in the figure. In subsequent detection of unknown samples, there is no need to compare the full spectrum; it is only necessary to extract the absorbance values ​​at these key feature spectral points to form a low-dimensional feature vector, which is then input into a pre-trained lightweight recognition model based on this feature subset for rapid comparison and judgment.

[0036] Finally, after obtaining the optimal feature reference spectrum and the feature spectrum to be tested, this invention performs a similarity metric calculation on the feature reference spectrum and the feature spectrum to be tested to obtain a similarity metric value. Assume that after feature screening, k key feature frequency points are determined (for example, for metodextrin, k=5, corresponding to frequencies f1, f2, ..., f5). The absorbance values ​​of the standard drug sample (reference sample) at these k key feature frequency points are constructed into a reference feature vector R=[A_R(f1),A_R(f2),...,A_R(fk)]; similarly, the absorbance values ​​of the sample to be tested at the same k frequency points are constructed into a sample feature vector S=[A_S(f1),A_S(f2),...,A_S(fk)], where vectors R and S represent the "fingerprint" encoding of the two samples to be compared on the most critical spectral features. The Pearson correlation coefficient r is calculated according to the following formula:

[0037] r=Σ[(R_i-μ_R)*(S_i-μ_S)] / sqrt{Σ[(R_i-μ_R)^2]*Σ[(S_i-μ_S)^2]}

[0038] Where R_i and S_i are the i-th elements (i from 1 to k) of vectors R and S, respectively. μ_R and μ_S are the arithmetic mean of vectors R and S, respectively. Σ represents the summation over all k feature points.

[0039] The calculated r value is the similarity metric, ranging from -1 to +1. r = +1 indicates that the two spectral feature vectors have completely identical trends, representing a perfect positive correlation. r = 0 indicates no linear correlation. r = -1 indicates completely opposite trends. In drug identification, this invention focuses on the degree of positive correlation. An empirical threshold (e.g., r ≥ 0.80) is set, determined based on test results from a large number of known samples, effectively reducing false alarms while ensuring a high recognition rate. Judgment logic: If the calculated similarity metric r ≥ 0.80, the feature spectrum of the sample to be tested is determined to be highly matched with the feature spectrum of the current standard drug sample, i.e., it is "yes" to this type of drug. If r < 0.80, it is determined to be a mismatch, i.e., "no".

[0040] The beneficial effects of this invention are:

[0041] This invention, through precise substrate pretreatment and spin-coating of novel drug samples, combined with terahertz time-domain spectroscopy, can acquire characteristic spectral information of samples in the 0.1-3 THz frequency band. Utilizing the unique vibrational and rotational energy level transitions of different novel drug molecules in the terahertz frequency band, it can achieve rapid qualitative identification of various novel drugs such as metoprolol and isopramipexole. The detection process does not require complex chemical pretreatment and is non-destructive, fast, and highly specific, providing a new and effective technical means for rapid on-site screening of novel drugs. Attached Figure Description

[0042] Figure 1 This is a schematic diagram of the terahertz detection device in a terahertz spectroscopy detection method for novel drugs.

[0043] Figure 2 This is a schematic diagram of the terahertz spectrum of medoxomil, a terahertz spectroscopic detection method for novel drugs. (a) is the spectrum before smoothing, and (b) is the spectrum after smoothing.

[0044] Figure 3 This is a schematic diagram of the terahertz spectrum of isopramipexole in a terahertz spectral detection method for novel drugs. (a) is the graph before smoothing, and (b) is the graph after smoothing.

[0045] Figure 4 This is a schematic diagram of the terahertz spectrum of etomidate in a terahertz spectroscopy detection method for novel drugs. (a) is the graph before smoothing, and (b) is the graph after smoothing.

[0046] Figure 5 This is a schematic diagram of the terahertz spectrum of the US-based Tonyqin, a terahertz spectral detection method for novel drugs. (a) is the spectrum before smoothing, and (b) is the spectrum after smoothing.

[0047] Figure 6 This is a schematic diagram of the terahertz spectrum of a methanol solution of methyl fluoroacetate before and after smoothing in a terahertz spectroscopic detection method for novel drugs. (a) is the graph before smoothing, and (b) is the graph after smoothing.

[0048] Figure 1 In the middle: 1. Ultrafast pulsed laser; 2. Beam splitter; 3. Chopper; 4. Time delay device; 5. Lens; 6. Terahertz radiation source; 7. Off-axis parabolic mirror; 8. Sample; 9. Polarizer; 10. Transmission detection device; 11. Reflection detection device; 12. Terahertz signal detection unit; 13. λ / 4 waveplate; 14. Neraston prism; 15. Photodiode; 16. Lock-in amplifier; 17. Chopper driver. Detailed Implementation

[0049] This invention focuses on five typical drugs—metoprolol, isopramipexole, etomidate, metoprolol, and fluoroacetic acid in methanol—and uses a precision spin-coating method to prepare thin film samples of these drugs. By setting a gradient rotation speed control variable from 0 to 2500 rpm, the influence of sample uniformity on terahertz spectra is studied. Multidimensional spectral data in the 0.1-2 THz frequency band are collected using a terahertz time-domain spectroscopy detection device to clarify the key detection parameters and data range. Analysis of experimental data revealed that in the low-frequency range (0.1-0.5 THz), optical parameters such as sample absorbance and dielectric loss are sensitive to spin coating speed. As the speed increases (up to 2500 rpm), the dispersion of sample molecular aggregates increases, the scattering effect weakens, and the amplitude of characteristic peaks decreases (the low-frequency absorbance at 2500 rpm decreases by approximately 14% compared to the baseline value). In the mid-to-high frequency range (0.5-2 THz), the sample absorption characteristics are dominated by intrinsic molecular vibrations, and the spin coating speed has a negligible impact on the position of characteristic peaks. This enables rapid detection of both traditional and new drugs, exhibiting high accuracy and sensitivity, broad applicability, and suitability for widespread application, providing a powerful technical means for combating criminals.

[0050] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0051] The implementation process of this invention includes three stages:

[0052] The first stage involves preparing drug samples, including the following steps:

[0053] Step 1: Select a 1cm×1cm×0.5cm high-purity quartz glass slide as the substrate material. Clean it sequentially with acetone and anhydrous ethanol for 10 minutes each.

[0054] Step 2: Dry the surface of the glass slide with a constant temperature electric heating plate, and finally place it in a 120℃ oven for 30 minutes to ensure that the surface of the substrate is clean and free of contamination.

[0055] Step 3: Under constant temperature and humidity (23±1℃, RH40±5%), accurately transfer 15μL of hazardous sample solution using a calibrated 20μL micropipette, and slowly drop the solution onto the center of the pretreated quartz substrate using the pendant drop method.

[0056] Step 4: Immediately place the substrate of the coated sample onto the vacuum chuck of the spin coater and set the gradient rotation speed parameter to 500 rpm.

[0057] Furthermore, methanol solutions of metoprolol, isopramipex, etomidate, metoprolol, or methyl fluoroacetate were used, with the concentration controlled within the range of 1-10 mg / mL, to ensure the signal response intensity when the terahertz pulse interacts with the sample.

[0058] Furthermore, the solution needs to be filtered through a 0.22μm organic filter membrane before being transferred to remove particulate impurities and avoid interference with the terahertz spectral signal.

[0059] The spin coating acceleration of the spin coater was set to 0, 500, 1000, 1500, 2000 and 2500 rpm, with a duration of 30s. The 0 rpm group served as a control group and did not undergo spin coating.

[0060] The second phase, through Figure 1 The terahertz detection device shown detects the terahertz time-domain spectral information of drug samples.

[0061] In this invention, the parameters of the terahertz time-domain spectroscopy device include: a laser range of 780-820 nm, a pulse width ≤100 fs, a time delay device for linear scanning with a scanning range of 0-10 ps and a step size of 10 fs, a single measurement time controlled within the range of 50-60 s, an operating bias voltage of 10 V, an electrode gap of 10 μm, a terahertz pulse bandwidth covering the 0.1-2.2 THz frequency band, and a lock-in amplifier parameter set to 100 ms.

[0062] In this invention, the number of terahertz time-domain spectroscopy detections is preferably 5-10 times.

[0063] In this invention, when the number of terahertz time-domain spectral detections is preferably ≥5, it is preferable to calculate the average value of the obtained original spectrum as the reference spectrum and the spectrum to be measured.

[0064] The third stage involves data processing of the detected terahertz spectra to ultimately determine the drug type, as detailed below:

[0065] First, the terahertz spectral data of the detected samples are preprocessed. Preprocessing methods include one or more of the following: standard normal variable transformation, multivariate scattering correction, Savitzky-Golay convolution smoothing, and wavelet transform. The frequency domain spectral data (absorbance or refractive index as a function of frequency sequence) acquired by the terahertz time-domain spectral system and obtained through Fourier transform is imported into data processing software (such as Origin, MATLAB, or Python). In this embodiment, the Savitzky-Golay convolution smoothing method is used for preprocessing. Specifically, for each data point in the spectral data sequence (e.g., the i-th point), m points before and after it are taken (window width W = 2m + 1, in this example W = 9, so m = 4) to form a local window. Within this window, a first-order cubic polynomial (i.e., the form y = a0 + a1x + a2x) is used. 2 +a3x 3The process involves performing a least-squares fit on the original data points within the window. The fitted polynomial is then calculated at the center point of the window (the i-th point). This fitted value replaces the original, potentially noisy, value of the i-th data point. The window is then moved one point forward (from the i-th point to the (i+1)-th point). This process is repeated until the entire spectral data sequence has been traversed. After this smoothing process, Figure 2 (a) and (b) in the figure are schematic diagrams of the terahertz transmittance of metoimidyl ester before and after smoothing. Figure 3 (a) and (b) in the figure are schematic diagrams of the terahertz transmittance of isopramex before and after smoothing. Figure 4 (a) and (b) in the figure are schematic diagrams of the terahertz transmittance of etoposide before and after smoothing. Figure 5 (a) and (b) in the figure are schematic diagrams of the terahertz transmittance of the Metonitin before and after smoothing. Figure 6 (a) and (b) in the figure are schematic diagrams of the terahertz transmittance of methyl fluoroacetate before and after smoothing. High-frequency random spikes in the spectral curves were effectively suppressed, and the curves became smoother. The characteristic absorption peaks located at 0.3THz, 0.8THz, 1.2THz, etc., were clearly preserved, and the peak positions and shapes did not undergo significant distortion, laying a reliable data foundation for subsequent feature extraction and similarity calculation.

[0066] Secondly, the optimal feature spectra are selected through screening. The preferred screening methods include one or more of the following: continuous projection algorithm, minimum angle regression algorithm, competitive adaptive reweighted sampling algorithm, and vector machine regression algorithm. This embodiment of the invention preferably uses the vector machine regression algorithm. Specifically, the method involves using preprocessed full-band terahertz spectral data containing standard samples of various known drugs (metoprolol, isopramipexole, etc.) as the training set. Each spectrum is a high-dimensional vector, with its dimension equal to the number of sampling points. A label is assigned to each spectrum (e.g., a label value of "1" is assigned to metoprolol, and different label values ​​are assigned to other substances or background). A radial basis function (RBF) is selected as the kernel function of the SVR model, and the model parameters (such as the penalty factor C and the kernel width γ) are initialized. The initial SVR model is trained using data from all spectral bands. This model aims to learn the complex mapping relationship between spectral features and substance labels. Assessing Feature Importance: After training, SVR models (especially those based on RBF kernels) do not directly provide feature coefficients, but the importance of each frequency point (i.e., each spectral dimension) can be assessed through its support vectors or derived indices based on model weights in the feature space (such as approximation using a linear kernel or calculating the contribution of each feature to the gradient of the decision function). Ranking and Removal: Based on the assessed importance scores, Sort all spectral features (frequency points) in ascending order. Remove the feature or group of features with the lowest importance score (e.g., remove the lowest 10% of features each time). Reconstruct the dataset and retrain: Reconstruct the training dataset using the remaining feature spectral subset and retrain a new SVR model based on it. Recursive loop: Repeat the above "evaluation-sorting-removal-retraining" process, gradually reducing the size of the feature spectral subset with each iteration. Throughout the recursive elimination process, continuously monitor the performance of the SVR model on an independent validation set (e.g., prediction correlation coefficient R², root mean square error RMSE). Stop the recursion when the SVR model performance on the validation set reaches its peak or begins to decline. The remaining feature spectral subset at this point is the optimal combination of feature spectra while ensuring the model's best generalization ability. For example, after screening, several narrow peaks at positions such as 0.32THz, 0.85THz, and 1.21THz are finally identified as the most critical feature spectra for identifying "metoprolol". Map the selected key feature frequency points back to the original spectrum. Figure 2-6 These points should precisely correspond to the significant absorption peaks marked in the figure. In subsequent detection of unknown samples, there is no need to compare the full spectrum; it is only necessary to extract the absorbance values ​​at these key feature spectral points to form a low-dimensional feature vector, which is then input into a pre-trained lightweight recognition model based on this feature subset for rapid comparison and judgment.

[0067] Finally, after obtaining the optimal feature reference spectrum and the feature spectrum to be tested, this invention performs a similarity metric calculation on the feature reference spectrum and the feature spectrum to be tested to obtain a similarity metric value. Assume that after feature screening, k key feature frequency points are determined (for example, for metodextrin, k=5, corresponding to frequencies f1, f2, ..., f5). The absorbance values ​​of the standard drug sample (reference sample) at these k key feature frequency points are constructed into a reference feature vector R=[A_R(f1),A_R(f2),...,A_R(fk)]; similarly, the absorbance values ​​of the sample to be tested at the same k frequency points are constructed into a sample feature vector S=[A_S(f1),A_S(f2),...,A_S(fk)], where vectors R and S represent the "fingerprint" encoding of the two samples to be compared on the most critical spectral features. The Pearson correlation coefficient r is calculated according to the following formula:

[0068] r=Σ[(R_i-μ_R)*(S_i-μ_S)] / sqrt{Σ[(R_i-μ_R)^2]*Σ[(S_i-μ_S)^2]}

[0069] Where R_i and S_i are the i-th elements (i from 1 to k) of vectors R and S, respectively. μ_R and μ_S are the arithmetic mean of vectors R and S, respectively. Σ represents the summation over all k feature points.

[0070] The calculated r value is the similarity metric, ranging from -1 to +1. r = +1 indicates that the two spectral feature vectors have completely identical trends, representing a perfect positive correlation. r = 0 indicates no linear correlation. r = -1 indicates completely opposite trends. In drug identification, this invention focuses on the degree of positive correlation. An empirical threshold (e.g., r ≥ 0.80) is set, determined based on test results from a large number of known samples, effectively reducing false alarms while ensuring a high recognition rate. Judgment logic: If the calculated similarity metric r ≥ 0.80, the feature spectrum of the sample to be tested is determined to be highly matched with the feature spectrum of the current standard drug sample, i.e., it is "yes" to this type of drug. If r < 0.80, it is determined to be a mismatch, i.e., "no".

[0071] Example explanation: Taking "Metoprolol - Sample 1 to be tested" in Table 1 as an example:

[0072] 1. From Figure 2 The absorbance values ​​of the metoimidide standard spectrum and the key characteristic frequency points selected (e.g., 0.32, 0.58, 0.85, 1.21, 1.60 THz) are shown, and a reference vector R is constructed.

[0073] 2. From the spectrum of the sample to be tested 1, read the absorbance values ​​at five identical frequency points and construct the test vector S.

[0074] 3. Substitute R and S into the Pearson formula above to calculate the similarity metric r.

[0075] 4. The similarity metric value r is calculated to be 0.856.

[0076] 5. Since 0.856 ≥ 0.80, the sample 1 to be tested is determined to be metoimidazole (result: "yes").

[0077] In this invention, the MATLAB platform is used, and the raw data is imported using xlsread. To eliminate the influence of sample order, the dataset is first randomly shuffled. Then, the dataset is divided into training and test sets in a 7:3 ratio. Input features and output variables are extracted separately and normalized. Range normalization is used to scale all data to the [0,1] interval to improve the stability and convergence speed of model training. Output variables are also normalized to facilitate regression model training and subsequent index calculation.

[0078] This invention employs a Support Vector Machine Regression (SVR) model with a radial basis function (RBF) as the kernel function. The main objective is to extract the most effective feature frequency points for identifying specific drugs from the full-band spectrum, reducing data dimensionality and improving identification efficiency and anti-interference capabilities. Input data: A full-band terahertz spectral training set containing samples of various known categories (target drug, other drugs, solvents, and interfering substances). Output: A subset of optimal feature frequency points (e.g., key frequency points for identifying metoimidazole are: 0.32THz, 0.58THz, 0.85THz, 1.21THz, 1.60THz). Key parameters include the penalty factor (C=4.0) and kernel width (g=0.8); other parameters are set to default by the LIBSVM toolbox. The regression type is selected as ε-SVR (corresponding to -s3), and the accuracy parameter is set to -p0.01. The model training process uses a normalized training set, and predictions are made on both the training and test sets. This is used to train the SVR model and perform feature elimination. The dataset includes spectral data of known categories, metoprolol standard solution samples (target value = 1), standard samples of other novel drugs such as isopramipexole and etomidate (target value = 0), pure methanol solvent samples (target value = 0), and samples of potentially present common adulterants or interfering substances (target value = 0). The test / validation set is used to evaluate the effectiveness and generalization ability of the selected feature subset. Its sample composition is similar to the training set, but it consists of independently collected data that was not used in training. It is used to monitor changes in model performance during feature elimination to determine the optimal feature subset.

[0079] In this invention, when the similarity metric value is ≥80%, the drug sample to be tested is determined to be consistent with the standard drug sample.

[0080] The following detailed description of the rapid identification method for novel drugs based on terahertz spectroscopy provided by the present invention, with reference to specific embodiments, should not be construed as limiting the scope of protection of the present invention.

[0081] Example 1

[0082] A terahertz spectroscopic detection method for novel drugs includes the following steps:

[0083] Preparation of drug samples:

[0084] Step 1: Select a 1cm×1cm×0.5cm high-purity quartz glass slide as the substrate material. Clean it sequentially with acetone and anhydrous ethanol for 10 minutes each.

[0085] Step 2: Dry the surface of the glass slide with a constant temperature electric heating plate, and finally place it in a 120℃ oven for 30 minutes to ensure that the surface of the substrate is clean and free of contamination.

[0086] Step 3: Under constant temperature and humidity (23±1℃, RH40±5%), accurately transfer 15μL of hazardous sample solution using a calibrated 20μL micropipette, and slowly drop the solution onto the center of the pretreated quartz substrate using the pendant drop method.

[0087] Step 4: Immediately place the substrate of the coated sample on the vacuum chuck of the spin coater, and set the gradient speed parameters to 0, 500, 1000, 1500, 2000 and 2500 rpm, with a duration of 30s.

[0088] Terahertz spectroscopy was performed on the prepared drug samples, and the acquired spectra were preprocessed. The spectral data was then imported into Origin software, and the software's plotting function was used to obtain the preprocessed spectra (e.g., ...). Figure 2-6 (As shown).

[0089] The similarity metric between the characteristic reference spectrum and the characteristic test spectrum was calculated using the correlation coefficient method. The results are shown in Table 1. If the similarity metric is ≥80%, the sample is determined to be a drug; otherwise, it is a non-new drug.

[0090] The programming platform used in this embodiment is Matalic2019a.

[0091] Methanol solution samples of metoprolol, isopramidine, etomidate, metoprolol, or methyl fluoroacetate were detected by liquid chromatography-mass spectrometry, and the results are shown in Table 1.

[0092] Table 1. Judgment results of samples of novel drugs to be tested

[0093] Standard materials Sample 1 to be tested Sample 2 to be tested Sample 3 to be tested Sample 4 to be tested 5 samples to be tested Metoimidazole Similarity metric: 0.856 Rapid determination method result: Yes Liquid chromatography-mass spectrometry determination result: Yes Similarity metric: 0.138 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.112 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.132 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.134 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Isopropamid Similarity metric: 0.156 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.914 Rapid determination method result: Yes Liquid chromatography-mass spectrometry determination result: Yes Similarity metric: 0.105 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.186 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.113 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No etomidate Similarity metric: 0.103 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.213 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.921 Rapid determination method result: Yes Liquid chromatography-mass spectrometry determination result: Yes Similarity metric: 0.172 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.104 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Meitoni Qin Similarity metric: 0.023 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.236 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.025 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.831 Rapid determination method result: Yes Liquid chromatography-mass spectrometry determination result: Yes Similarity metric: 0.053 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No methyl fluoroacetate Similarity metric: 0.055 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.32 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.251 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.025 Rapid determination method result: No Liquid chromatography-mass spectrometry determination result: No Similarity metric: 0.935 Rapid determination method result: Yes Liquid chromatography-mass spectrometry determination result: Yes

[0094] As shown in Table 1, the terahertz spectroscopy detection method for novel drugs provided by this invention can quickly and accurately determine whether a sample is a novel drug, and the results are consistent with those obtained by liquid chromatography-mass spectrometry (LC-MS). Compared with LC-MS, the method proposed in this invention has the advantages of being green, rapid, low-cost, and non-destructive to the sample.

[0095] Therefore, the terahertz spectral detection method for novel drugs provided by this invention can serve as an effective laboratory screening and auxiliary identification tool, providing key spectral evidence and qualitative basis for the judicial identification and source tracing of novel drugs.

[0096] The detailed descriptions listed above are merely specific descriptions of feasible embodiments of the present invention, and are not intended to limit the scope of protection of the present invention. All equivalent methods or modifications that do not depart from the technology of the present invention should be included within the scope of protection of the present invention.

Claims

1. A terahertz spectroscopic detection method for novel drugs, characterized in that, include: The first stage, the preparation of terahertz samples of the drug, includes the following steps: Step 1: Select a high-purity quartz glass slide as the substrate material and clean it sequentially with acetone and anhydrous ethanol; Step 2: Dry the surface of the glass slide with a constant temperature electric heating plate, and then place it in a 120℃ oven for 30 minutes to ensure that the surface of the substrate is clean and free of contamination; Step 3: Under constant temperature and humidity, accurately transfer 15 μL of drug sample solution using a calibrated 20 μL micropipette, and slowly add the solution to the center of the pretreated quartz substrate using the hanging drop method. Step 4: Immediately place the quartz substrate coated with the sample onto the vacuum chuck of the spin coater, set the gradient rotation speed parameter to 500 rpm, so that the sample solution forms a uniform thin film on the substrate surface; In the second stage, the prepared drug samples were subjected to terahertz time-domain spectroscopy detection using the constructed terahertz time-domain spectroscopy detection device. The third stage involves data processing of the detected terahertz time-domain spectra: Step 1: Perform smoothing preprocessing on the detected terahertz time-domain spectral data to make the spectral curve smoother and free of spikes; Step 2: Filter the preprocessed terahertz spectral data to obtain the optimal characteristic spectrum; Step 3: Calculate the similarity metric between the characteristic reference spectrum and the characteristic spectrum to be tested, and determine the drug type based on the similarity metric.

2. The terahertz spectroscopic detection method for novel drugs according to claim 1, characterized in that, The drug sample in the first stage can be a methanol solution of metoprolol, isopramipex, etomidate, metoprolol, and methyl fluoroacetate, with the solution concentration controlled in the range of 1-10 mg / mL to ensure the signal response intensity when the terahertz pulse interacts with the sample.

3. The terahertz spectral detection method for novel drugs according to claim 3, characterized in that, In step 3 of the first stage, the drug sample solution needs to be filtered through a 0.22μm organic filter membrane before being transferred to remove particulate impurities in the solution and avoid interference from impurities with the terahertz spectral signal.

4. The terahertz spectral detection method for novel drugs according to claim 3, characterized in that, In step 4 of the first stage, the spin coating acceleration of the spin coater is set to 0, 500, 1000, 1500, 2000 and 2500 rpm, and the duration of the speed gradient is 30s. The 0 rpm group serves as a control group and does not undergo spin coating.

5. The terahertz spectral detection method for novel drugs according to claim 1, characterized in that, The terahertz time-domain spectroscopy detection device comprises, in sequence: an ultrafast pulsed laser 1, a beam splitter 2, a chopper 3, a time delay device 4, a lens 5, a terahertz radiation source 6, an off-axis parabolic mirror 7, a sample 8, a polarizer 9, a transmission detection device 10, a reflection detection device 11, a terahertz signal detection unit 12, a λ / 4 waveplate 13, a Neraston prism 14, a photodiode 15, a lock-in amplifier 16, and a chopper driver 17. The specific detection process using this device includes the following steps: Step 1: Start the ultrafast pulse laser (1) to output a laser beam with a pulse width ≤ 100fs, which is then split into pump light and probe light by a beam splitter (2); Step 2: Initialize system parameters, set the laser repetition frequency to 1-100kHz, adjust the relative position of the terahertz radiation source (6) and the transmission detection device (10) to ensure that the optical path coaxiality error is ≤0.1mm; Step 3: Guide the pump light to be incident on the terahertz radiation source (6), and generate terahertz pulses through photoconductive effect or optical rectification effect. Adjust the pump light power to 50-500mW so that the terahertz pulse bandwidth covers 0.1-3THz. Step 4: Use the time delay device (4) to adjust the optical path of the probe light so that the probe light and the terahertz pulse are synchronized in time at the terahertz signal detection unit (12), and the synchronization accuracy is controlled within ±50fs. Step 5: Acquire the terahertz reference pulse signal in the sample-free state, record its time-domain waveform, including amplitude, phase and pulse width information, as the reference for subsequent data processing; Step 6: Fix the sample (8) to be tested in the terahertz pulse transmission optical path, and use the transmission mode or the reflection mode to realize the interaction between the terahertz pulse and the sample (8). The transmission signal is obtained by the transmission detection device (10), and the reflection signal is obtained by the reflection detection device (11). Step 7: Drive the time delay device (4) to perform linear scanning, with a scanning range of 0-10ps and a step size of ≤10fs, and simultaneously acquire the terahertz pulse time-domain spectral signal modulated by the sample; Step 8: Perform multiple measurements on the same sample and take the average value of the time-domain spectral signal to reduce random noise interference. The measurement time for each measurement should be controlled within the range of 1-10 seconds.

6. The terahertz spectral detection method for novel drugs according to claim 5, characterized in that, In step 1, the ultrafast pulse laser (1) is a Ti:sapphire femtosecond laser with a center wavelength of 780-820nm and a spot diameter of 1-3mm to ensure the energy stability of the pump light and probe light after subsequent beam splitting. In step 2, when initializing system parameters, the ambient temperature should be adjusted to 23±2℃ and the relative humidity to 40%-60% simultaneously. The terahertz radiation source (6) and the transmission detection device (10) should be fixed by the anti-vibration platform to avoid the optical path coaxiality deviation from the set range caused by environmental vibration. In step 3 of the first stage, the terahertz radiation source (6) adopts a gallium arsenide photoconductive antenna. The angle of the pump light incident on the photoconductive antenna is adjusted to 45±5°, and a neutral density filter is added in the pump light optical path to realize the step adjustment of the pump light power and ensure that the terahertz pulse bandwidth stably covers 0.1-3THz. The electrode gap of the gallium arsenide photoconductive antenna is 5-20μm, and the working bias voltage is 5-30V. By adjusting the bias voltage, the carrier acceleration efficiency is optimized, so that the peak power of the terahertz pulse is increased by 10%-20%, and the signal-to-noise ratio of subsequent signal detection is enhanced. In step 6 of the first stage, when fixing the sample to be tested, a quartz sample cell is used to hold the drug sample. The thickness of the sample cell is 0.5-2mm, and the surface of the sample cell is provided with an anti-reflection coating to reduce the reflection loss of the terahertz pulse at the sample cell interface and improve the signal acquisition efficiency. In step 8 of the first stage, when measuring the same sample multiple times, a cyclic shift measurement method is adopted. Before each measurement, the sample is shifted 0.5-1mm along the optical path axis to avoid interference from local impurities or inhomogeneities of the sample on the measurement results and reduce the influence of random noise on the average value of the time domain signal.

7. The terahertz spectroscopic detection method for novel drugs according to claim 1, characterized in that, In step 1 of the third stage, the Savitzky-Golay convolution smoothing method is used for preprocessing. Specifically, for each data point i in the spectral data sequence, m points before and after it are taken to form a local window with a width W = 2m + 1. Within this window, a first-order cubic polynomial is used to perform least-squares fitting on the original data points within the window. The fitting value of the fitting polynomial at the center point i of the window is calculated. This fitting value is used to replace the original value of the i-th data point, which may contain noise. The window is then moved one point backward, from the i-th point to the (i+1)-th point. This process is repeated until the entire spectral data sequence is traversed. After smoothing, high-frequency random spikes in the spectral curve are effectively suppressed, and the curve becomes smoother.

8. The terahertz spectral detection method for novel drugs according to claim 7, characterized in that, In step 2 of the third stage, the method for screening characteristic spectra is as follows: The training set is a full-band terahertz spectral data containing various known drug standard samples after preprocessing. Each spectrum is a high-dimensional vector with a dimension equal to the number of sampling points. Each spectrum is assigned a label, and the radial basis function (RBF) is selected as the kernel function of the SVR model. The model parameters are initialized, and the initial SVR model is trained using data from all spectral bands. This model aims to learn the complex mapping relationship between spectral features and substance labels. Assessing feature importance: After training, the importance score of each frequency point is evaluated using the support vectors of the SVR model or derived metrics based on model weights in the feature space. Sorting and Removal: Based on the assessed importance scores, all spectral frequency points are sorted in ascending order, and the feature or group of features with the lowest importance score is removed; Reconstructing the dataset and retraining: Reconstruct the training dataset using the remaining feature spectrum subset and retrain a new SVR model based on this dataset; Recursive loop: Repeat the above process of "evaluation-sorting-removal-retraining". The size of the feature spectrum subset gradually shrinks with each iteration. Throughout the recursive process, continuously monitor the performance of the SVR model on the independent validation set. When the performance of the SVR model on the validation set reaches its peak or begins to decline, stop the recursion. The feature spectrum subset that is retained at this time is the optimal feature spectrum combination under the premise of ensuring the best generalization ability of the model.

9. The terahertz spectroscopic detection method for novel drugs according to claim 8, characterized in that, The implementation of step 3 in the third stage includes the following: Suppose that after feature selection, k key optimal feature frequency points are determined. The absorbance values ​​of the reference sample at these k key feature frequency points are constructed into a reference feature vector R=[A_R(f1),A_R(f2),...,A_R(fk)]. Similarly, the absorbance values ​​of the test sample at the same k frequency points are constructed into a test feature vector S=[A_S(f1),A_S(f2),...,A_S(fk)]. Vectors R and S represent the absorbance values ​​of the two samples to be compared at the most critical spectral features. The "fingerprint" code on the sample is used to calculate the similarity metric r, which ranges from -1 to +1. r = +1 indicates that the two spectral feature vectors have completely consistent trends, indicating a perfect positive correlation. r = 0 indicates no linear correlation, and r = -1 indicates completely opposite trends. A judgment threshold is set. If the similarity metric r ≥ the threshold, the sample is judged to be highly matched with the current standard drug sample, i.e., it is "the drug". If r < the threshold, it is judged to be mismatched, i.e., it is "not the drug".

10. The terahertz spectroscopic detection method for novel drugs according to claim 9, characterized in that, The formula for calculating the similarity metric r is as follows: r=Σ[(R_i-μ_R)*(S_i-μ_S)] / sqrt{Σ[(R_i-μ_R)^2]*Σ[(S_i-μ_S)^2]} Where R_i and S_i are the i-th elements in vectors R and S, respectively, i ranging from 1 to k; μ_R and μ_S are the arithmetic mean of the elements in vectors R and S, respectively; and Σ represents the summation over all k feature points.