A rapid nondestructive detection system for content of dendrobium polysaccharide and a method thereof
By acquiring spectral reflectance data of Dendrobium samples using an FLA6800 spectrometer and combining it with preprocessing and a polysaccharide content prediction model, the problem of time-consuming and labor-intensive detection of Dendrobium polysaccharide content was solved, achieving rapid, accurate, and non-destructive detection results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WENZHOU VOCATIONAL COLLEGE OF SCI & TECH
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-23
AI Technical Summary
Existing methods for detecting the polysaccharide content of Dendrobium officinale are time-consuming, labor-intensive, complex to operate, and difficult to achieve rapid, non-destructive, and batch processing, thus failing to meet the needs of breeding evaluation, product grading, and industrial processing.
The spectral reflectance data of Dendrobium samples were acquired in the near-infrared band using an FLA6800 spectrometer. Combined with dark current calibration and standard white plate calibration, rapid and non-destructive testing was achieved through preprocessing, characteristic band selection, and polysaccharide content prediction model.
It enables rapid, accurate, and non-destructive detection of Dendrobium polysaccharide content, improving detection efficiency and practical value, and providing visualized reports to support production, processing, and quality control.
Smart Images

Figure CN122259502A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of Dendrobium polysaccharide content detection technology, and more specifically, to a rapid and non-destructive detection system and method for Dendrobium polysaccharide content. Background Technology
[0002] Dendrobium, an important medicinal plant, contains polysaccharides as one of its main active components, which hold core value in pharmacological research, product development, and quality evaluation. Currently, the detection of Dendrobium polysaccharide content largely relies on physicochemical analysis methods such as the phenol-sulfuric acid method and high-performance liquid chromatography (HPLC). These methods require destructive sample processing, which is not only time-consuming and labor-intensive but also suffers from operational complexity, accumulated experimental errors, and low detection efficiency. Furthermore, the polysaccharide content varies significantly among different Dendrobium strains and parts, making it difficult for traditional methods to rapidly detect and grade large batches of samples, thus limiting its application in breeding evaluation, product grading, and industrial processing. Therefore, how to achieve rapid, non-destructive, and batch-processable detection of Dendrobium polysaccharide content while ensuring detection accuracy has become a pressing technical problem to be solved in this field. Summary of the Invention
[0003] In view of this, the present invention proposes a rapid and non-destructive testing system and method for Dendrobium polysaccharide content, in order to solve the problem that the existing technology cannot meet the requirements of rapid, non-destructive and high-precision testing.
[0004] This invention proposes a rapid and non-destructive method for detecting the polysaccharide content of Dendrobium officinale, comprising: Spectral reflectance data of Dendrobium samples in the near-infrared band were acquired using an FLA6800 spectrometer in the range of 600nm–1700nm. Dark current calibration and standard white plate calibration were performed before acquisition. The spectral reflectance data is preprocessed, including baseline subtraction, noise filtering, and scattering effect elimination, and the data is smoothed using the first and second derivatives to generate standardized spectral data. Feature bands and feature variables related to the content of Dendrobium polysaccharides were extracted from the standardized spectral data. The selection of the feature bands was performed by partial least squares regression and competitive adaptive reweighted sampling. Based on the aforementioned feature variables, the predicted value of Dendrobium polysaccharide content is obtained by training a polysaccharide content prediction model. The polysaccharide content prediction model is trained based on historical physicochemical measurement data and spectral data, and is trained using regression analysis, support vector machine or deep neural network model, and outputs the predicted value of Dendrobium polysaccharide content and its prediction confidence. The predicted values of Dendrobium polysaccharide content are graded, and the category of Dendrobium samples is determined based on the predicted values of Dendrobium polysaccharide content; The system outputs the detection results and generates a visualization report, which includes the sample number, detection site, spectral characteristic band, predicted polysaccharide content, and confidence level of the Dendrobium sample. The prediction confidence level is used to evaluate the reliability of the predicted value. When the confidence level is lower than a set threshold, it triggers the re-acquisition of spectral data and updates the prediction model.
[0005] Furthermore, the preprocessing includes: The collected spectral reflectance data are read point by point and a spectral vector is constructed; the mean of all data points in the spectral vector is calculated; the standard deviation of all data points in the spectral vector is calculated; the mean is subtracted from each data point in the spectral vector to obtain a centered sequence; each data point in the centered sequence is divided by the standard deviation to obtain a standardized spectral matrix with zero mean and unit variance; the standardized spectral matrix is rearranged according to the sample order and output.
[0006] Furthermore, noise filtering includes: Wavelet decomposition is performed on the standardized spectral matrix to obtain multi-scale wavelet coefficients; a threshold level and a threshold function are set; thresholding is performed on the wavelet coefficients of high-frequency scales; the wavelet coefficients of low-frequency scales are kept unchanged; inverse wavelet transform is performed on the thresholded wavelet coefficients to obtain a denoised spectral matrix; band consistency is checked on the denoised spectral matrix and then output.
[0007] Furthermore, in eliminating scattering effects, the following are included: The reference spectrum is calculated using the denoised spectral matrix; a univariate linear regression is performed on each sample spectrum and the reference spectrum to obtain the slope coefficient and intercept; linear correction is performed on each sample spectrum based on the slope coefficient and intercept; the corrected sample spectra are combined into a correction spectral matrix; vector normalization is performed on the correction spectral matrix to obtain the scattering correction spectrum.
[0008] Furthermore, the selection of characteristic bands includes: A candidate band set is constructed based on the scattering correction spectrum; the correlation coefficient between the candidate bands and the Dendrobium polysaccharide content label is calculated and sorted by absolute value; candidate bands with correlation coefficients lower than a preset threshold are removed; a stepwise projection algorithm is executed on the remaining candidate bands to generate projection vectors; collinear bands are discarded according to the independence and redundancy of the projection vectors; feature bands are determined and corresponding spectral variables are extracted to form a feature variable matrix.
[0009] Furthermore, when training the polysaccharide content prediction model, the following steps are included: Align the feature variable matrix with historical physicochemical measurement data according to sample number; divide the aligned dataset into training set and validation set; select support vector machine or deep neural network on the training set and set hyperparameters; perform iterative training with feature variables as input and polysaccharide content as target; calculate and record the prediction error index on the validation set; adjust the hyperparameters according to the error index until the convergence condition is met; solidify the polysaccharide content prediction model obtained from the training and output it.
[0010] Furthermore, in the process of classification, the following are included: Read the predicted values output by the polysaccharide content prediction model; calculate the high content threshold, medium content range, and low content threshold based on the quantiles of historical physicochemical measurement data; compare each predicted value with the threshold or range; mark samples with contents higher than the high content threshold as high content grade; mark samples with contents in the medium content range as medium content grade; mark samples with contents lower than the low content threshold as low content grade; generate and save the grading result set.
[0011] Furthermore, when outputting the prediction confidence, it includes: The residual sequence is calculated by comparing the actual physicochemical measurements and predicted values of the corresponding samples in the validation set; a residual distribution plot and a residual trend plot are plotted; a confidence factor is calculated based on the residual statistics; the confidence factor is compared with a set threshold to obtain a reliable or unreliable identifier; an instruction to reacquire spectral data is generated for unreliable identifiers; and the predicted values, confidence factors, and residuals are then compared. Figure 1 And output it.
[0012] Furthermore, when generating visualization reports, the following are included: Enter the sample number, detection site, and sampling time; enter the characteristic bands and their numbers; enter the predicted polysaccharide content and corresponding confidence level; embed the residual plot and accuracy evaluation; annotate abnormal results and grade the results; generate a visual report file and export it.
[0013] Compared with the prior art, the beneficial effects of the present invention are as follows: Near-infrared spectral reflectance data of Dendrobium samples were acquired in the range of 600nm to 1700nm using an FLA6800 spectrometer. Dark current calibration and standard white board calibration were combined to ensure the accuracy and repeatability of the original spectral signals. Preprocessing techniques, including standard normal variable transformation, wavelet threshold denoising, multivariate scattering correction, and first- and second-order derivative smoothing, comprehensively eliminated the influence of inter-sample differences, random noise, and scattering interference on the spectral signals, resulting in more stable spectral data. A combination of partial least squares regression, competitive adaptive reweighted sampling, and stepwise projection algorithms effectively extracted characteristic bands and variables highly correlated with Dendrobium polysaccharide content, reducing redundant information and improving the model's discriminative ability. Combining these characteristic variables with historical physicochemical measurement data further enhanced the model's performance. This invention employs support vector machines or deep neural network models for training and utilizes cross-validation to optimize model parameters, ensuring the generalization ability and accuracy of the prediction model and achieving efficient quantitative prediction of Dendrobium polysaccharide content. By constructing grading standards and classifying the prediction results into high, medium, and low grades, it not only achieves numerical output of content but also provides a clear basis for sample grading and quality control. The confidence level of the prediction results is evaluated by outputting residual plots and confidence factors, and automatic triggering of spectral data re-acquisition and model updates when confidence is low ensures the reliability and repeatability of the detection results. The final generated visualization report includes sample number, detection site, characteristic band, predicted value, confidence level, and anomaly alerts, providing comprehensive data support for production processing, quality control, and scientific research analysis. This invention achieves rapid, accurate, and non-destructive detection of Dendrobium polysaccharide content, solving the problems of complex operation, high destructiveness, and low efficiency of existing detection methods, significantly improving detection efficiency and practical value.
[0014] On the other hand, the present invention proposes a rapid and non-destructive detection system for Dendrobium polysaccharide content, comprising: The data acquisition module is configured to acquire spectral reflectance data of Dendrobium samples in the range of 600nm to 1700nm using an FLA6800 spectrometer and perform dark current calibration and standard white plate calibration. The data processing module is configured to perform standard normal variable transformation, wavelet threshold denoising, scattering effect correction, and first and second derivative smoothing on the acquired spectral reflectance data to generate standardized spectral data. The feature extraction module is configured to construct candidate bands, perform correlation screening and stepwise projection algorithm calculations based on standardized spectral data, and output a feature variable matrix. The prediction module is configured to call a trained polysaccharide content prediction model based on feature variables and output the predicted value of Dendrobium polysaccharide content and its prediction confidence. The grading module is configured to classify the predicted values into high, medium, and low levels based on quantile thresholds or intervals and generate a grading result set. The results output module is configured to generate a visual report that includes sample number, detection site, sampling time, characteristic band, predicted polysaccharide content, confidence level, residual plot, accuracy evaluation and anomaly alerts, and to issue re-acquisition and model update commands when the confidence level is below the threshold. Attached Figure Description
[0015] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 This is a flowchart of a rapid and non-destructive method for detecting the content of Dendrobium polysaccharides, provided as an embodiment of the present invention.
[0016] Figure 2 This is a functional block diagram of a rapid and non-destructive testing system for Dendrobium polysaccharide content provided in an embodiment of the present invention. Detailed Implementation
[0017] Exemplary embodiments of the present application will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the disclosure to those skilled in the art. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other. The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0018] See Figure 1 As shown, this embodiment of the invention provides a rapid and non-destructive method for detecting the polysaccharide content of Dendrobium officinale, including: S100: Acquire spectral reflectance data of Dendrobium samples in the near-infrared band. The spectral reflectance data is acquired by an FLA6800 spectrometer in the range of 600nm–1700nm. Dark current calibration and standard white plate calibration are performed before acquisition. S200: Preprocesses the spectral reflectance data, including baseline subtraction, noise filtering, scattering effect elimination, and smooths the data using the first and second derivatives to generate standardized spectral data; S300: Extract characteristic bands and characteristic variables related to the content of Dendrobium polysaccharides from standardized spectral data. The selection of characteristic bands is carried out by partial least squares regression and competitive adaptive reweighted sampling. S400: Based on feature variables, the predicted value of Dendrobium polysaccharide content is obtained by training a polysaccharide content prediction model. The polysaccharide content prediction model is trained based on historical physicochemical measurement data and spectral data. It is trained using regression analysis, support vector machine or deep neural network model, and outputs the predicted value of Dendrobium polysaccharide content and its prediction confidence. S500: The predicted value of Dendrobium polysaccharide content is graded into multiple levels, including high, medium and low, and the category of Dendrobium sample is determined based on the predicted value of Dendrobium polysaccharide content. S600: Outputs test results and generates a visualization report. The visualization report includes the sample number of the Dendrobium sample, the test site, the spectral characteristic bands, the predicted polysaccharide content and its confidence level. The prediction confidence level is used to evaluate the reliability of the predicted value. When the confidence level is lower than the set threshold, it triggers the re-acquisition of spectral data and updates the prediction model.
[0019] Understandably, this embodiment ensures the authenticity and comparability of the spectral signals by collecting spectral reflectance data of Dendrobium samples in the range of 600nm to 1700nm and combining dark current calibration with standard white board calibration. The preprocessing stage incorporates zero-mean unit variance standardization, wavelet threshold denoising, and scattering correction, effectively reducing the interference of environmental noise and scattering effects on the spectral data and significantly improving the stability and accuracy of the spectral data. The use of correlation ranking, competitive adaptive reweighted sampling, and stepwise projection algorithms to screen feature bands ensures the independence and representativeness of input variables, avoiding model distortion caused by information redundancy. In the prediction stage, support vector machines or deep neural network models are used to analyze the feature variables. Iterative training and hyperparameter optimization through error index feedback ensure high accuracy and generalization in the prediction results. During the grading process, quantile thresholds based on historical physicochemical measurement data are used to stratify Dendrobium samples into high, medium, and low content categories, improving the intuitiveness and operability of polysaccharide content evaluation. Simultaneously, residual statistics are used to generate prediction confidence levels, and a re-acquisition and model update mechanism is triggered when the confidence level is insufficient, forming a dynamic correction and adaptive loop to ensure the reliability and robustness of the prediction results. Finally, a visual report is output, including sample number, detection site, sampling time, characteristic band, predicted polysaccharide content, confidence level, and residual plot, providing comprehensive and intuitive reference for researchers and production processes. Overall, this method achieves rapid, non-destructive, and accurate detection of Dendrobium polysaccharide content, solving the technical problems of slow detection speed, susceptibility to interference, and lack of confidence feedback in existing technologies.
[0020] In some embodiments of this application, preprocessing includes: The collected spectral reflectance data are read point by point and a spectral vector is constructed; the mean of all data points in the spectral vector is calculated; the standard deviation of all data points in the spectral vector is calculated; the mean is subtracted from each data point in the spectral vector to obtain a centered sequence; each data point in the centered sequence is divided by the standard deviation to obtain a standardized spectral matrix with zero mean and unit variance; the standardized spectral matrix is rearranged according to the sample order and output.
[0021] Specifically, fresh Dendrobium samples were selected and cut into transverse sections approximately 2 mm thick, and kept at room temperature and natural humidity. Spectral reflectance data were acquired using an FLA6800 near-infrared spectrometer, with the acquisition wavelength range set to 600 nm to 1700 nm, a sampling interval of 2 nm, an integration time of 50 ms, and three scans, with the average value taken. Before acquisition, dark current calibration was performed using a blackbody material, followed by white-board calibration using a standard reflectance white plate to ensure the accuracy of the baseline signal and the comparability of spectral intensities. All spectral data were saved as CSV files, with each spectral data point consisting of reflectance at consecutive wavelengths.
[0022] During the data import phase, a Python 3.9 environment was used in conjunction with data processing libraries such as NumPy and Pandas to read the spectral files. The spectral data of each sample was read line by line, forming a spectral vector for each sample and storing it as a two-dimensional matrix structure. For each spectral vector, all data points in the vector were first traversed, and its overall average value was calculated. Then, using this average value as a reference, each data point in the spectral vector was subtracted from this reference value to obtain an adjusted centered sequence.
[0023] After centering, each centered sequence is iterated again to calculate its overall fluctuation range. Based on this range, each data point is scaled to normalize the spectra of all samples to the same numerical scale. The scaling process uses a point-by-point operation, meaning each data point is compared with the overall fluctuation range of the same spectrum to obtain the scaled result. At this point, each spectrum has been adjusted to zero mean and a uniform scale.
[0024] To ensure consistency across different samples, all standardized spectral vectors were recombined into a standardized spectral matrix. Each row of this matrix corresponds to one sample, and each column corresponds to one wavelength point. The matrix structure is complete and easy for subsequent algorithms to use. After generating the matrix, the samples were rearranged according to the order of sample acquisition to ensure a one-to-one correspondence between the data files and the actual sample numbers. The final matrix was then exported as a new CSV file. This file serves as the standard input data source for subsequent noise filtering, scattering correction, and feature extraction.
[0025] In some embodiments of this application, noise filtering includes: Wavelet decomposition is performed on the standardized spectral matrix to obtain multi-scale wavelet coefficients; a threshold level and a threshold function are set; thresholding is performed on the wavelet coefficients of high-frequency scales; the wavelet coefficients of low-frequency scales are kept unchanged; inverse wavelet transform is performed on the thresholded wavelet coefficients to obtain a denoised spectral matrix; band consistency is checked on the denoised spectral matrix and then output.
[0026] Specifically, the standardized spectral matrix is imported into a data analysis platform, such as using the Wavelet Toolbox in MATLAB R2022a or Python 3.9 with the PyWavelets library. Each spectral curve is stored as an independent input sequence within a row of the matrix, corresponding to a wavelength range of 600 nm to 1700 nm.
[0027] First, the wavelet basis functions used for wavelet decomposition are defined. In this embodiment, the db8 wavelet basis from the Daubechies family is chosen because it has good smoothness and tight support, and can simultaneously preserve the main trends and local features of the spectrum. The number of decomposition layers is set to 5 to ensure that high-frequency noise features are captured without excessively weakening the original signal.
[0028] Next, wavelet decomposition is performed on each spectral curve. The decomposition results in a set of approximate coefficients (low-frequency part) and several sets of detail coefficients (high-frequency part), where the detail coefficients reflect fast fluctuations or spikes in the spectrum.
[0029] In the threshold setting stage, the standard deviation of the high-frequency coefficients at each decomposition level is calculated individually to characterize the noise level. Then, the threshold is determined based on empirical rules or an adaptive method. In this embodiment, a soft thresholding function is used; that is, when the absolute value of a high-frequency coefficient exceeds the threshold, it is reduced to near the threshold boundary, while coefficients below the threshold are directly set to zero. This avoids the curve discontinuity caused by a hard threshold and effectively reduces noise.
[0030] During the thresholding process, coefficients at the high-frequency level are processed point by point, while low-frequency approximation coefficients remain unchanged to ensure that the main trend information of the spectrum is fully preserved. After thresholding operations at all levels are completed for each spectral curve, the results are recombined into a coefficient set.
[0031] Next, an inverse wavelet transform is performed on the thresholded multi-scale wavelet coefficients to reconstruct the spectral curves line by line, resulting in a denoised spectral matrix. This matrix significantly reduces random spikes and high-frequency noise in numerical performance, and the spectral curves are smoother and more continuous.
[0032] After obtaining the denoised spectrum, an additional band consistency check is performed. The check steps are as follows: Calculate the difference between the denoised spectrum and the normalized spectrum point by point; The difference is compared with the allowable deviation range, which in this embodiment is set to ±2% of the original spectral amplitude. If the difference in certain bands exceeds the allowable range, it will be recorded as an abnormal band and marked in the output results, and the threshold parameter will be reset and repeated. If the differences between all bands are within the allowable range, then the denoising result is confirmed to be effective.
[0033] Finally, the denoised spectral matrix that has passed the inspection is exported as a CSV file, which contains the wavelength sequence and corresponding reflectance value of each spectrum for subsequent scattering effect elimination and characteristic band selection.
[0034] In some embodiments of this application, the process of eliminating scattering effects includes: The reference spectrum is calculated using the denoised spectral matrix; a univariate linear regression is performed on each sample spectrum and the reference spectrum to obtain the slope coefficient and intercept; linear correction is performed on each sample spectrum based on the slope coefficient and intercept; the corrected sample spectra are combined into a correction spectral matrix; vector normalization is performed on the correction spectral matrix to obtain the scattering correction spectrum.
[0035] Specifically, the denoised spectral matrix, which has already undergone noise filtering, is imported into the data processing platform. Each row corresponds to a sample spectrum, and each column corresponds to a specific wavelength point. To ensure consistency between batches of data, a representative spectrum is first selected from the sample set as a reference spectrum. This reference spectrum can be obtained by calculating the average value of all sample spectra at each wavelength point, or a standard sample spectrum with known stable components can be selected as a benchmark.
[0036] After the reference spectrum is determined, each sample spectrum is processed individually. The specific steps are as follows: First, the sample spectrum and the reference spectrum are compared point-to-point within the wavelength range, and a fitted sequence is constructed; then, the correspondence between the sample spectrum and the reference spectrum is fitted using a univariate linear regression method to obtain the slope coefficient and intercept of the sample spectrum relative to the reference spectrum.
[0037] Based on the obtained slope coefficient and intercept, linear correction is performed on the sample spectrum: when the slope coefficient is too large, the overall amplitude of the spectral curve is compressed; when the slope coefficient is too small, the overall amplitude of the spectral curve is stretched; simultaneously, the overall translation error is eliminated by adjusting the intercept. After this correction step, the sample spectrum maintains the same overall shape as the reference spectrum in the wavelength dimension, and scattering interference is effectively reduced.
[0038] After correcting individual spectra, all corrected sample spectra are recombined to form a corrected spectral matrix. To further eliminate differences in light intensity range among different samples, vector normalization is performed on the corrected spectral matrix. Specifically, amplitude normalization is applied to all data points of each sample spectrum to ensure that the spectral values are within the same range, thereby eliminating scattering effects caused by differences in sample structure, particle size distribution, or measurement angle.
[0039] After normalization, the final scattering-corrected spectral matrix is obtained. This matrix more accurately reflects the spectral characteristics of the Dendrobium samples, reducing background interference and inconsistencies between samples. Finally, the scattering-corrected spectra are stored as data files, along with a processing log, including the reference spectrum generation method, linear regression fitting parameters, and normalization intervals, to ensure the traceability and repeatability of the results.
[0040] In some embodiments of this application, the selection of characteristic bands includes: A candidate band set is constructed based on the scattering correction spectrum; the correlation coefficient between the candidate bands and the Dendrobium polysaccharide content label is calculated and sorted by absolute value; candidate bands with correlation coefficients lower than a preset threshold are removed; a stepwise projection algorithm is executed on the remaining candidate bands to generate projection vectors; collinear bands are discarded according to the independence and redundancy of the projection vectors; feature bands are determined and corresponding spectral variables are extracted to form a feature variable matrix.
[0041] Specifically, the spectral matrix corrected for scattering effects is first imported into the data processing platform. Each column corresponds to a wavelength point, and each row corresponds to the spectrum of a Dendrobium sample. According to the needs of the detection task, a preliminary set of candidate wavelength bands is first set in the entire spectral range. For example, multiple wavelength segments are equally divided from 600nm to 1700nm, and each segment is regarded as a candidate wavelength band.
[0042] Subsequently, correlation analysis was performed between the data from each candidate band and the physicochemical measurements of Dendrobium polysaccharide content. Specifically, the average absorbance or principal component score of the sample set for each candidate band was extracted, and correlation calculations were performed with the corresponding polysaccharide content label. The obtained correlation values were sorted from highest to lowest absolute value to visually reflect the close relationship between each candidate band and the polysaccharide content.
[0043] In the ranking results, candidate bands with correlation values below a preset screening threshold are eliminated, and the remaining bands constitute a simplified candidate set. To further avoid redundancy caused by strong collinearity between different bands, a stepwise projection algorithm is used to process the simplified set. This process introduces candidate bands one by one, calculating their projection relationship with the selected bands at each introduction. If a newly introduced band significantly overlaps with or duplicates information with the selected bands, it is automatically discarded.
[0044] After a stepwise projection screening process, the remaining band set was identified as the characteristic bands. These characteristic bands, while maintaining a high correlation with polysaccharide content, reduced redundancy and collinearity. Finally, data columns corresponding to the characteristic bands were extracted from the scattering correction spectral matrix and combined to form a characteristic variable matrix. This matrix serves as the core input data for subsequent prediction model training. Its dimensionality is significantly reduced compared to the original spectrum, while its information content is more concentrated, thus providing a stable and efficient data foundation for polysaccharide content prediction.
[0045] In some embodiments of this application, training the polysaccharide content prediction model includes: Align the feature variable matrix with historical physicochemical measurement data according to sample number; divide the aligned dataset into training set and validation set; select support vector machine or deep neural network on the training set and set hyperparameters; perform iterative training with feature variables as input and polysaccharide content as target; calculate and record the prediction error index on the validation set; adjust the hyperparameters according to the error index until the convergence condition is met; solidify the polysaccharide content prediction model obtained from the training and output it.
[0046] Specifically, the feature variable matrix obtained through feature band selection is first organized with historical physicochemical measurement data to establish a correspondence for each Dendrobium sample, ensuring that each row of data in the matrix matches the physicochemical measurement value one by one. Then, the organized dataset is split according to sample number order, with one part serving as the training set and the other as the validation set. The training set is used for parameter learning, and the validation set is used for performance evaluation.
[0047] On the training set, users can choose between a Support Vector Machine (SVM) model or a Deep Neural Network (DNN) model based on task complexity. When choosing an SVM, a radial basis function (RBF) or linear kernel function needs to be set in the input layer, along with penalty and kernel parameters. When choosing a DNN, multiple hidden layers need to be constructed, and hyperparameters such as the number of neurons, activation function, learning rate, and optimization algorithm need to be set. After setting these parameters, the feature variables are input into the selected model, and the polysaccharide content (measured by physicochemical methods) is used as the target value to execute an iterative training process.
[0048] During training, each iteration outputs a prediction result, which is compared with the actual physical and chemical measurements to calculate prediction error metrics, including mean absolute error and mean squared error. All error metrics are recorded and compared with the previous training result to observe whether the training has converged. If the error metrics do not meet the expected convergence criteria, the training process needs to be returned to adjust hyperparameters, such as modifying the learning rate, optimizing the regularization strength, or changing the number of network layers, and then training is performed again.
[0049] Training is considered complete when the error index remains stable or decreases by less than a preset threshold across multiple iterations. At this point, the model parameters are solidified and saved as a callable polysaccharide content prediction model. The final prediction model can be directly input into new feature variable matrices in subsequent detection tasks, outputting the corresponding predicted Dendrobium polysaccharide content and its prediction confidence, thus achieving fast, stable, and repeatable non-destructive detection.
[0050] In some embodiments of this application, the classification process includes: Read the predicted values output by the polysaccharide content prediction model; calculate the high content threshold, medium content range, and low content threshold based on the quantiles of historical physicochemical measurement data; compare each predicted value with the threshold or range; mark samples with contents higher than the high content threshold as high content grade; mark samples with contents in the medium content range as medium content grade; mark samples with contents lower than the low content threshold as low content grade; generate and save the grading result set.
[0051] Specifically, firstly, all predicted values output by the polysaccharide content prediction model are read, sorted, and stored according to sample number. Then, historical physicochemical measurement data are retrieved to statistically analyze the polysaccharide content distribution of similar samples, and quantile points are determined based on cumulative probability. For example, the upper quantile is set as the high content threshold, the middle quantile interval as the medium content range, and the lower quantile as the low content threshold. The quantile setting process can be completed through statistical analysis and saved in the system as a grading benchmark.
[0052] Next, each predicted value is read one by one and compared with the aforementioned threshold or range. When the predicted value is greater than the high content threshold, the corresponding sample is automatically assigned a high content level; when the predicted value is within the medium content range, it is assigned a medium content level; and when the predicted value is lower than the low content threshold, it is assigned a low content level.
[0053] After all samples have been graded, the system pairs each sample's ID, predicted value, and grading result with the results to generate a grading result set. This set is saved as a table or database record and can be retrieved or exported later. The final grading results can be used not only for rapid sample quality classification but also as foundational data for further statistical analysis and quality traceability.
[0054] In some embodiments of this application, the output prediction confidence level includes: The residual sequence is calculated by comparing the actual physicochemical measurements and predicted values of the corresponding samples in the validation set; a residual distribution plot and a residual trend plot are plotted; a confidence factor is calculated based on the residual statistics; the confidence factor is compared with a set threshold to obtain a reliable or unreliable identifier; an instruction to reacquire spectral data is generated for unreliable identifiers; and the predicted values, confidence factors, and residuals are then compared. Figure 1 And output it.
[0055] Specifically, the system first retrieves the prediction results from the validation set samples and compares them with the corresponding actual physicochemical measurements, calculating the prediction deviation for each sample to form a continuous residual sequence. Then, the system arranges the residual sequence in chronological or sample number order, generating a residual distribution map to display the overall deviation range; simultaneously, it generates a residual trend map to visually reflect the stability of the residuals as the sample order changes. Next, based on residual statistical analysis, the system extracts indicators such as the residual mean, dispersion, and frequency of extreme deviations, and calculates a confidence factor by combining these indicators. This confidence factor is compared with a preset threshold; a confidence factor higher than the threshold is marked as reliable, and a confidence factor lower than the threshold is marked as unreliable. For unreliable results, the system automatically generates instructions to reacquire spectral data and triggers a model update prompt to prevent inaccurate data from being directly adopted. Finally, the system packages and outputs the predicted value, confidence factor, residual distribution map, and trend map for each sample, generating a complete confidence result set, which is then presented to the user through a report or interface.
[0056] In some embodiments of this application, generating a visualization report includes: Enter the sample number, detection site, and sampling time; enter the characteristic bands and their numbers; enter the predicted polysaccharide content and corresponding confidence level; embed the residual plot and accuracy evaluation; annotate abnormal results and grade the results; generate a visual report file and export it.
[0057] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.
Claims
1. A rapid and non-destructive method for detecting the polysaccharide content of Dendrobium officinale, characterized in that, include: Spectral reflectance data of Dendrobium samples in the near-infrared band were acquired using an FLA6800 spectrometer in the range of 600nm–1700nm. Dark current calibration and standard white plate calibration were performed before acquisition. The spectral reflectance data is preprocessed, including baseline subtraction, noise filtering, and scattering effect elimination, and the data is smoothed using the first and second derivatives to generate standardized spectral data. Feature bands and feature variables related to the content of Dendrobium polysaccharides were extracted from the standardized spectral data. The selection of the feature bands was performed by partial least squares regression and competitive adaptive reweighted sampling. Based on the aforementioned feature variables, the predicted value of Dendrobium polysaccharide content is obtained by training a polysaccharide content prediction model. The polysaccharide content prediction model is trained based on historical physicochemical measurement data and spectral data, and is trained using regression analysis, support vector machine or deep neural network model, and outputs the predicted value of Dendrobium polysaccharide content and its prediction confidence. The predicted values of Dendrobium polysaccharide content are graded, and the category of Dendrobium samples is determined based on the predicted values of Dendrobium polysaccharide content; The system outputs the detection results and generates a visualization report, which includes the sample number, detection site, spectral characteristic band, predicted polysaccharide content, and confidence level of the Dendrobium sample. The prediction confidence level is used to evaluate the reliability of the predicted value. When the confidence level is lower than a set threshold, it triggers the re-acquisition of spectral data and updates the prediction model.
2. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 1, characterized in that, Preprocessing includes: The collected spectral reflectance data are read point by point and a spectral vector is constructed; the mean of all data points in the spectral vector is calculated; the standard deviation of all data points in the spectral vector is calculated; the mean is subtracted from each data point in the spectral vector to obtain a centered sequence; each data point in the centered sequence is divided by the standard deviation to obtain a standardized spectral matrix with zero mean and unit variance; the standardized spectral matrix is rearranged according to the sample order and output.
3. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 2, characterized in that, Noise filtering includes: Wavelet decomposition is performed on the standardized spectral matrix to obtain multi-scale wavelet coefficients; a threshold level and a threshold function are set; thresholding is performed on the wavelet coefficients of high-frequency scales; the wavelet coefficients of low-frequency scales are kept unchanged; inverse wavelet transform is performed on the thresholded wavelet coefficients to obtain a denoised spectral matrix; band consistency is checked on the denoised spectral matrix and then output.
4. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 3, characterized in that, When eliminating scattering effects, the following are included: The reference spectrum is calculated using the denoised spectral matrix; a univariate linear regression is performed on each sample spectrum and the reference spectrum to obtain the slope coefficient and intercept; linear correction is performed on each sample spectrum based on the slope coefficient and intercept; the corrected sample spectra are combined into a correction spectral matrix; vector normalization is performed on the correction spectral matrix to obtain the scattering correction spectrum.
5. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 4, characterized in that, When selecting characteristic bands, the following are included: A candidate band set is constructed based on the scattering correction spectrum; the correlation coefficient between the candidate bands and the Dendrobium polysaccharide content label is calculated and sorted by absolute value; candidate bands with correlation coefficients lower than a preset threshold are removed; a stepwise projection algorithm is executed on the remaining candidate bands to generate projection vectors; collinear bands are discarded according to the independence and redundancy of the projection vectors; feature bands are determined and corresponding spectral variables are extracted to form a feature variable matrix.
6. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 5, characterized in that, Training a polysaccharide content prediction model includes: Align the feature variable matrix with historical physicochemical measurement data according to sample number; divide the aligned dataset into training set and validation set; select support vector machine or deep neural network on the training set and set hyperparameters; perform iterative training with feature variables as input and polysaccharide content as target; calculate and record the prediction error index on the validation set; adjust the hyperparameters according to the error index until the convergence condition is met; solidify the polysaccharide content prediction model obtained from the training and output it.
7. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 6, characterized in that, When performing grading, the following are included: Read the predicted values output by the polysaccharide content prediction model; calculate the high content threshold, medium content range, and low content threshold based on the quantiles of historical physicochemical measurement data; compare each predicted value with the threshold or range; mark samples with contents higher than the high content threshold as high content grade; mark samples with contents in the medium content range as medium content grade; mark samples with contents lower than the low content threshold as low content grade; generate and save the grading result set.
8. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 7, characterized in that, When outputting prediction confidence, the following should be included: The residual sequence is calculated by comparing the actual physicochemical measurements and predicted values of the corresponding samples in the validation set; a residual distribution map and a residual trend map are plotted; a confidence factor is calculated based on the residual statistics; the confidence factor is compared with a set threshold to obtain a reliable or unreliable identifier; an instruction to reacquire spectral data is generated for the unreliable identifier; and the predicted value, confidence factor, and residual map are output together.
9. The rapid and non-destructive method for detecting the content of Dendrobium polysaccharides according to claim 8, characterized in that, When generating visualization reports, the following are included: Enter the sample number, detection site, and sampling time; enter the characteristic bands and their numbers; enter the predicted polysaccharide content and corresponding confidence level; embed the residual plot and accuracy evaluation; annotate abnormal results and grade the results; generate a visual report file and export it.
10. A rapid and non-destructive testing system for Dendrobium polysaccharide content, used to implement the rapid and non-destructive testing method for Dendrobium polysaccharide content according to any one of claims 1 to 9, characterized in that, include: The data acquisition module is configured to acquire spectral reflectance data of Dendrobium samples in the range of 600nm to 1700nm using an FLA6800 spectrometer and perform dark current calibration and standard white plate calibration. The data processing module is configured to perform standard normal variable transformation, wavelet threshold denoising, scattering effect correction, and first and second derivative smoothing on the acquired spectral reflectance data to generate standardized spectral data. The feature extraction module is configured to construct candidate bands, perform correlation screening and stepwise projection algorithm calculations based on standardized spectral data, and output a feature variable matrix. The prediction module is configured to call a trained polysaccharide content prediction model based on feature variables and output the predicted value of Dendrobium polysaccharide content and its prediction confidence. The grading module is configured to classify the predicted values into high, medium, and low levels based on quantile thresholds or intervals and generate a grading result set. The results output module is configured to generate a visual report that includes sample number, detection site, sampling time, characteristic band, predicted polysaccharide content, confidence level, residual plot, accuracy evaluation and anomaly alerts, and to issue re-acquisition and model update commands when the confidence level is below the threshold.