Intelligent fitting method and system for molecular weight distribution of polysaccharides in panax ginseng

By combining signal processing and encoder networks, real-time and accurate fitting of the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides and identification of ultra-high molecular weight components were achieved, solving the problems of large detection errors and insufficient real-time performance in existing technologies and meeting the needs of online quality control.

CN122245462APending Publication Date: 2026-06-19WUHAN HUMANWELL PHARM CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WUHAN HUMANWELL PHARM CO LTD
Filing Date
2026-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The molecular weight distribution detection of ginseng polysaccharides cannot be accurately fitted in real time, the identification of ultra-high molecular weight components is insufficient, and the prediction accuracy is low when the data is incomplete, which cannot meet the requirements of online quality control.

Method used

The system employs a signal receiving and buffering module, an ultra-high molecular weight component identification module, an encoder selection module, a feature extraction module, a time-series smoothing module, and a dual-branch decoding module. Through signal-to-noise ratio analysis, Berry equation fitting, and a molecular weight range sensing encoder network, it achieves real-time prediction and correction of molecular weight distribution.

Benefits of technology

It improves the accuracy of ultra-high molecular weight components determination, meets the real-time requirements of online process monitoring, ensures the continuity and stability of prediction results, and enhances the accuracy of molecular weight distribution prediction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245462A_ABST
    Figure CN122245462A_ABST
Patent Text Reader

Abstract

This invention relates to the field of polysaccharide molecular weight analysis technology, and discloses an intelligent fitting method and system for the molecular weight distribution of *Panax notoginseng* polysaccharides. The method includes: receiving flow cytometry signals and appending them to a dual-channel buffer; calculating the signal-to-noise ratio of multi-angle scattering signals and identifying candidate segments of ultra-high molecular weight components; selecting a molecular weight range sensing encoder network based on data integrity level and ultra-high molecular weight component identification; fusing multi-source signals to extract latent feature vectors; performing temporal smoothing on current and historical latent feature vectors; generating a molecular weight distribution prediction curve through a dual-branch decoder network; applying the Berry equation to fit low-angle scattering data to correct the ultra-high molecular weight branch prediction curve; and splicing the conventional branch and corrected ultra-high molecular weight branch prediction curves to output the molecular weight distribution analysis results. This invention solves the fitting error problem caused by high-angle signal attenuation in ultra-high molecular weight component detection, and achieves real-time prediction output of molecular weight distribution.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of polysaccharide molecular weight analysis technology, and more specifically, to an intelligent fitting method and system for the molecular weight distribution of Panax notoginseng polysaccharides. Background Technology

[0002] In the quality control and process monitoring of *Gynostemma pentaphyllum* polysaccharides, molecular weight distribution is a key indicator characterizing their physicochemical properties. Gel permeation chromatography coupled with a multi-angle laser light scattering detection system (GPC-MALLS) is a commonly used technique for determining the absolute molecular weight distribution of polysaccharides. Its principle involves linearly fitting the multi-angle scattered light intensity data to zero angle using the Zimm equation, and then calculating the molecular weight corresponding to each elution time point using differential refractive index signals, thereby obtaining a complete molecular weight distribution curve.

[0003] However, *Gynostemma pentaphyllum* polysaccharides contain ultra-high molecular weight components with molecular weights exceeding 10^6 Da. These ultra-high molecular weight components have a large radius of gyration. According to Rayleigh scattering theory, the intensity of scattered light decreases sharply with increasing scattering angle, resulting in extremely low signal-to-noise ratios for signals acquired by high-angle detectors. Traditional Zimm equation fitting relies on linear regression of multiple angle data points. When high-angle data is severely affected by noise, the slope error of the fitted line is amplified, leading to significant deviations in the extrapolated result to zero angle, thus affecting the accuracy of ultra-high molecular weight component determination.

[0004] Meanwhile, existing methods for calculating molecular weight distribution require offline batch processing after the complete elution curve is acquired. The entire elution process typically lasts for tens of minutes, during which time it is impossible to obtain predicted information on the molecular weight distribution. In the production of ginseng polysaccharides, fluctuations in process parameters may cause the molecular weight distribution to deviate from the target range. If the predicted molecular weight distribution results cannot be obtained in real time, it is difficult to detect anomalies and adjust process parameters in a timely manner, which is detrimental to online quality control of the production process. Summary of the Invention

[0005] This invention provides an intelligent fitting method and system for the molecular weight distribution of Panax notoginseng polysaccharides, solving the technical problems in related technologies such as the inability to accurately fit the molecular weight distribution of Panax notoginseng polysaccharides in real time, insufficient identification of ultra-high molecular weight components, and low prediction accuracy under incomplete data.

[0006] This invention discloses an intelligent fitting method for the molecular weight distribution of ginseng polysaccharides, comprising the following steps: receiving the flow cytometry detection signal output by a gel permeation chromatography coupled with a multi-angle laser light scattering detection system, appending the differential refractive index signal and the multi-angle laser light scattering signal to a dual-channel data buffer respectively, and obtaining the current buffer data set; For each sampling time point, the average signal-to-noise ratio of the low-angle group scattering signal and the high-angle group scattering signal are calculated respectively. The continuous time point sequence with the average signal-to-noise ratio of the high-angle group being lower than the preset signal-to-noise ratio threshold is identified and marked as candidate segments of ultra-high molecular weight components. Calculate the ratio of the number of valid data points to the number of theoretical data points in the current buffer, determine the data integrity level, and select the corresponding molecular weight range sensing encoder network from the pre-trained molecular weight range sensing encoder group based on the data integrity level and the candidate segment identifier of ultra-high molecular weight components. Based on the candidate segment identifiers of ultra-high molecular weight components, differential refractive signals and low-angle group scattering signals are fused. The fused signal is then input into the selected molecular weight range sensing encoder network to extract potential feature vectors. The current latent feature vector is weighted and fused with the historical time-series smoothed latent feature vector. The fusion weight is dynamically adjusted according to the data completeness level to generate the current time-series smoothed latent feature vector. The temporally smoothed latent feature vector is input into the dual-branch decoder network, which outputs the conventional branch prediction curve and the ultra-high molecular weight branch prediction curve, respectively. The Berry equation was applied to the low-angle group scattering data in the candidate region of ultra-high molecular weight components for weighted least squares fitting, and the correction factor was calculated to correct the ultra-high molecular weight branch prediction curve. The conventional branching prediction curve and the corrected ultra-high molecular weight branching prediction curve are smoothly spliced ​​together to output the molecular weight distribution analysis results.

[0007] Furthermore, the calculation of the average signal-to-noise ratio of the low-angle group scattered signals and the high-angle group scattered signals includes: The low-angle group includes detector channels that detect angles less than a preset angle boundary value, and the high-angle group includes detector channels that detect angles greater than or equal to a preset angle boundary value. For each sampling time point, calculate the mean and standard deviation of the scattered light intensity at each detection angle within the local time window centered on that sampling time point. Use the ratio of the mean to the standard deviation as the signal-to-noise ratio for that angle. Average the signal-to-noise ratios of each angle in the high-angle group to obtain the average signal-to-noise ratio of the high-angle group. The width of the local time window is 5 to 15 consecutive sampling points, and the preset signal-to-noise ratio threshold ranges from 3 to 8.

[0008] Furthermore, the method for determining the data integrity level includes: The product of the number of valid data points and the fixed time interval is divided by the preset total rinsing time, and then multiplied by the number of integrity levels and rounded down to obtain the data integrity level. The number of integrity level classifications ranges from 5 to 10; the molecular weight range sensing encoder group is organized using a two-dimensional index structure, with the first dimension being the data integrity level index and the second dimension being the existence identifier of candidate segments of ultra-high molecular weight components.

[0009] Furthermore, the molecular weight range sensing encoder network adopts a one-dimensional convolutional neural network structure, which includes multiple convolutional layers and pooling layers, and outputs a fixed-dimensional latent feature vector through a fully connected layer; The training dataset for each molecular weight range sensing encoder network consists of truncated elution curves corresponding to the integrity interval, and the training label is the complete molecular weight distribution curve of the corresponding sample. Supervised learning is used for training.

[0010] Furthermore, the fusion processing of the differential refractive index signal and the low-angle group scattering signal based on the candidate segment identifier of the ultra-high molecular weight component includes: If the candidate region of ultra-high molecular weight component is included, the differential refractive index signal and the mean value of the scattered light intensity of the low angle group are normalized and preprocessed before being weighted and fused. The value of the fusion weight coefficient ranges from 0.6 to 0.8. If the candidate region for ultra-high molecular weight components is not included, then only the differential refractive signal is input to the molecular weight range sensing encoder network.

[0011] Furthermore, the dynamic adjustment of the fusion weight based on the data integrity level includes: The fusion weights are defined using a linear function; the higher the data completeness level, the greater the weight of the current latent feature vector. The lower bound of the weights ranges from 0.1 to 0.3, and the upper bound ranges from 0.7 to 0.9. At the first sampling moment of the elution process, the current latent feature vector is directly used as the temporal smoothing latent feature vector.

[0012] Furthermore, the dual-branch decoder network includes: A shared feature processing layer receives temporally smoothed latent feature vectors and outputs intermediate feature representations through a fully connected layer; The regular branch consists of multiple transposed convolutional layers, and the output is a regular branch prediction curve covering the low to medium molecular weight range. The ultra-high molecular weight branch, composed of multiple transposed convolutional layers, outputs an ultra-high molecular weight branch prediction curve covering the high molecular weight range; the dual-branch decoder network is trained using piecewise weighted mean square error loss.

[0013] Furthermore, the application of the Berry equation for weighted least squares fitting includes: The weight of each data point at each angle is proportional to the signal-to-noise ratio of that angle; the higher the signal-to-noise ratio, the greater the weight. The intercept and slope of the Berry equation are obtained by linear regression. The reciprocal square of the intercept is the weight-average molecular weight. The correction factor is the ratio of the weight-average molecular weight obtained by fitting the Berry equation to the weight-average molecular weight corresponding to the ultra-high molecular weight branch prediction curve. The corrected ultra-high molecular weight branch prediction curve was obtained by scaling the molecular weight axis based on the correction factor.

[0014] Furthermore, the smooth stitching of the conventional branching prediction curve and the corrected ultra-high molecular weight branching prediction curve includes: A splicing transition region is set near the boundary value between the conventional molecular weight component and the ultra-high molecular weight component, and the half width of the splicing transition region is 0.1 to 0.3 times the boundary value; A linear interpolation algorithm is used in the splicing transition area to achieve a smooth transition between the two curves; The molecular weight distribution analysis results include number-average molecular weight, weight-average molecular weight, polydispersity index, and the proportion of ultra-high molecular weight components. When the data integrity reaches a preset integrity threshold, the output result is labeled as the measured value. The preset integrity threshold ranges from 0.85 to 0.95.

[0015] This invention provides an intelligent fitting system for the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides, comprising: The signal receiving and buffering module is used to receive the flow cytometry detection signal output by the gel permeation chromatography coupled with the multi-angle laser light scattering detection system, and append the differential refractive index signal and the multi-angle laser light scattering signal to the dual-channel data buffer respectively; the ultra-high molecular weight component identification module is used to calculate the average signal-to-noise ratio of the low-angle group and the high-angle group at each sampling time point, and to identify and mark the candidate segments of ultra-high molecular weight components. The encoder selection module is used to calculate the data integrity level and select the corresponding molecular range sensing encoder network from the molecular range sensing encoder group based on the data integrity level and the candidate segment identifier of the ultra-high molecular weight component. The feature extraction module is used to perform signal fusion processing based on the candidate segment identifiers of ultra-high molecular weight components and extract potential feature vectors through the selected molecular weight range sensing encoder network. The temporal smoothing module is used to weight and fuse the current latent feature vector with the historical temporal smoothing latent feature vector to generate the current temporal smoothing latent feature vector; The dual-branch decoding module is used to input the temporally smoothed latent feature vector into the dual-branch decoder network and output the conventional branch prediction curve and the ultra-high molecular weight branch prediction curve, respectively. The Berry equation correction module is used to apply the Berry equation to the low-angle group scattering data in the candidate region of ultra-high molecular weight components, perform weighted least squares fitting, calculate the correction factor, and correct the ultra-high molecular weight branch prediction curve. The results output module is used to smoothly stitch together the conventional branch prediction curve and the corrected ultra-high molecular weight branch prediction curve to output the molecular weight distribution analysis results.

[0016] This invention automatically identifies candidate segments of ultra-high molecular weight components by performing signal-to-noise ratio analysis on multi-angle laser light scattering signals. It employs the Berry equation to perform weighted least-squares fitting on the low-angle scattering data, reducing fitting errors caused by high-angle signal attenuation and improving the accuracy of ultra-high molecular weight component determination. By establishing a molecular weight range-sensing encoder group and adaptively selecting the encoder network based on the data integrity level, it achieves real-time molecular weight distribution prediction output during the elution data streaming process, meeting the real-time requirements of online process monitoring. Through a temporal smoothing mechanism and a dual-branch decoder network, it ensures the continuity and stability of the prediction results, improves the prediction accuracy for different molecular weight ranges, and solves the technical problems of existing methods being unable to output molecular weight distribution prediction results in real time and having large errors in ultra-high molecular weight component determination. Attached Figure Description

[0017] Figure 1 This is a flowchart of the intelligent fitting of the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides provided in an embodiment of the present invention; Figure 2 This is a distribution diagram of dual-channel detection signal data provided in an embodiment of the present invention; Figure 3 This is a signal-to-noise ratio analysis diagram of multi-angle scattering signals provided in an embodiment of the present invention; Figure 4 This is a data integrity and encoder network selection diagram provided in the embodiments of the present invention; Figure 5 This is a diagram of the signal fusion processing procedure provided in an embodiment of the present invention; Figure 6 This is a graph of the Berry equation fitting input data provided in an embodiment of the present invention; Figure 7 This is a prediction distribution curve of the dual-branch decoder provided in an embodiment of the present invention. Detailed Implementation

[0018] In the quality control and process monitoring of *Gynostemma pentaphyllum* polysaccharides, molecular weight distribution is a key indicator characterizing their physicochemical properties. Gel permeation chromatography coupled with a multi-angle laser light scattering detection system (GPC-MALLS) is a commonly used technique for determining the absolute molecular weight distribution of polysaccharides. Its principle involves linearly fitting the multi-angle scattered light intensity data to zero angle using the Zimm equation, and then calculating the molecular weight corresponding to each elution time point using differential refractive index signals, thereby obtaining a complete molecular weight distribution curve.

[0019] However, there are polysaccharides in *Panax ginseng* with molecular weights exceeding [a certain value]. The ultra-high molecular weight components (UHMWFCs) have a large radius of gyration. According to Rayleigh scattering theory, the intensity of scattered light decreases sharply with increasing scattering angle, resulting in extremely low signal-to-noise ratios for signals acquired by high-angle detectors. Traditional Zimm equation fitting relies on linear regression of multiple angle data points. When high-angle data is severely affected by noise, the slope error of the fitted line is amplified, leading to significant deviations in the extrapolated result to zero angle, thus affecting the accuracy of UHMWFC determination.

[0020] Meanwhile, existing methods for calculating molecular weight distribution require offline batch processing after the complete elution curve is acquired. The entire elution process typically lasts for tens of minutes, during which time it is impossible to obtain predicted information on the molecular weight distribution. In the production of ginseng polysaccharides, fluctuations in process parameters may cause the molecular weight distribution to deviate from the target range. If the predicted molecular weight distribution results cannot be obtained in real time, it is difficult to detect anomalies and adjust process parameters in a timely manner, which is detrimental to online quality control of the production process.

[0021] This embodiment provides an intelligent fitting method for the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides, aiming to solve the fitting error problem of the Zimm equation caused by high-angle signal attenuation in the detection of ultra-high molecular weight components, as well as the problem that existing methods cannot output the molecular weight distribution prediction results in real time during data streaming acquisition.

[0022] At least one embodiment of the present invention discloses an intelligent fitting method for the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides, such as... Figure 1 As shown, it includes the following steps: Step 1: Receive the streaming detection signal, append it to the dual-channel buffer, and obtain the current buffered data set; Continuously receive flow cytometry detection signals from a gel permeation chromatography coupled with a multi-angle laser light scattering detection system, at fixed time intervals. The newly added differential refractive index signal and multi-angle laser light scattering signal are respectively added to the dual-channel data buffer to obtain the buffer data set at the current time.

[0023] The differential refractive signal buffer stores the sequence of refractive index changes at each sampling time point. , This indicates the number of valid data points in the current buffer. The multi-angle laser light scattering signal buffer stores each sampling time point. Scattered light intensity matrix at each detection angle ,in Indicates the index of the sampling time point. Indicates the detection angle index. This indicates the number of detection angles.

[0024] Furthermore, fixed time intervals The value range is from 0.5 seconds to 2 seconds, and is set according to the sampling frequency and real-time requirements of the gel permeation chromatography coupled with the multi-angle laser light scattering detection system.

[0025] Step 2: Calculate the signal-to-noise ratio of multi-angle scattering signals and identify candidate segments of ultra-high molecular weight components; For each sampling time point in the multi-angle laser light scattering signal buffer, the average signal-to-noise ratio of the low-angle group scattering signal and the high-angle group scattering signal are calculated respectively. Sampling time points where the average signal-to-noise ratio of the high-angle group is lower than the preset signal-to-noise ratio threshold are identified. Time point sequences that continuously meet this condition are marked as candidate segments of ultra-high molecular weight components, and ultra-high molecular weight component candidate segment identifiers are generated.

[0026] The low-angle group includes detector channels that detect angles less than a preset angle threshold, while the high-angle group includes detector channels that detect angles greater than or equal to a preset angle threshold. For the first... The average signal-to-noise ratio of the high-angle group at each sampling time point The calculation method is as follows: in, This represents the set of detection angle indices for the high-angle group. This indicates the number of detectors in the high-angle group. Indicates the first Within a local time window centered on the sampling time point, the first... The average intensity of scattered light at each angle This represents the corresponding standard deviation.

[0027] Furthermore, the width of the local time window is set to 5 to 15 consecutive sampling points to smooth out instantaneous noise fluctuations and improve the stability of signal-to-noise ratio calculation.

[0028] Furthermore, the preset angle cutoff value is set based on the angle configuration of the gel permeation chromatography coupled with multi-angle laser light scattering detection system and the typical molecular weight range of *Panax notoginseng* polysaccharide. For a gel permeation chromatography coupled with multi-angle laser light scattering detection system configured with 18 detection angles (from 15° to 165°), the preset angle cutoff value is set to 90°, that is, the first 9 angles are divided into the low angle group and the last 9 angles are divided into the high angle group.

[0029] Furthermore, the preset signal-to-noise ratio threshold is determined by statistical analysis based on historical detection data of Gynostemma pentaphyllum polysaccharide standard samples. The critical signal-to-noise ratio value that can distinguish between conventional molecular weight components and ultra-high molecular weight components is selected as the preset signal-to-noise ratio threshold, and the preset signal-to-noise ratio threshold ranges from 3 to 8.

[0030] Furthermore, the candidate segment identifier for ultra-high molecular weight components is a binary label. When an ultra-high molecular weight component candidate segment is detected, the identifier value is 1; otherwise, it is 0. The ultra-high molecular weight component candidate segment identifier is used for the selection of the molecular weight range sensing encoder network and the determination of the signal fusion strategy.

[0031] Step 3: Calculate the data integrity level and select the corresponding molecular weight range sensing encoder network; Calculate the ratio of the number of valid data points in the current buffer to the number of theoretical data points corresponding to the preset total elution time to determine the data integrity level; based on the combination of the data integrity level and the candidate segment identifier of the ultra-high molecular weight component, select the corresponding molecular weight range sensing encoder network from the pre-trained molecular weight range sensing encoder group to obtain the currently applicable molecular weight range sensing encoder network parameters.

[0032] Among them, data integrity level The calculation method is as follows: in, This indicates the number of valid data points in the current buffer. Indicates a fixed time interval. This indicates the preset total elution time. Indicates the number of completeness level classifications. This indicates the floor function.

[0033] Furthermore, the number of completeness level classifications The value ranges from 5 to 10, with a typical value of 8. This means that the data integrity of the washing process is divided into 8 levels from 0% to 100%, with each level corresponding to an increase of approximately 12.5% ​​in integrity.

[0034] Furthermore, the molecular weight range-aware encoder group comprises multiple pre-trained molecular weight range-aware encoder networks, each optimized for different data integrity levels and signal feature combinations. The molecular weight range-aware encoder group is organized using a two-dimensional index structure. The first dimension is the data integrity level index, and the second dimension is the existence identifier of candidate regions for ultra-high molecular weight components. This two-dimensional index allows for the location and selection of the molecular weight range-aware encoder network best suited to the current data state.

[0035] Furthermore, the sensing encoder network for each molecular weight range adopts a one-dimensional convolutional neural network structure, with an input layer receiving length of... , where This indicates the number of valid data points in the current buffer. Local features are extracted and the receptive field is expanded progressively through multiple convolutional and pooling layers. Finally, a fixed-dimensional latent feature vector is output through a fully connected layer. ,in is the dimension of the latent feature vector.

[0036] Furthermore, the one-dimensional convolutional neural network structure contains 3 to 5 convolutional layers, with kernel sizes of 3 to 7 and strides of 1 to 2. The ReLU activation function is used. The pooling layers employ max pooling with a pooling window size of 2 to 3. The fully connected layers flatten the feature maps output from the convolutional layers and map them to latent feature vectors of fixed dimensions. The value range is from 64 to 256.

[0037] Furthermore, the training datasets for the sensor encoder networks of each molecular weight range are composed of truncated elution curves corresponding to the data integrity intervals. For data integrity levels of... A molecular weight range sensing encoder network, whose training samples are truncated from the start of elution to... Partial elution curves at various times, including Indicates the level of data integrity. Indicates the number of completeness level classifications. This indicates the preset total elution time, and the training label is the complete molecular weight distribution curve of the corresponding sample.

[0038] Furthermore, the training dataset is constructed as follows: complete elution curve data of different batches of *Gynostemma pentaphyllum* polysaccharide samples are collected, and each complete elution curve is truncated to generate training samples corresponding to each data integrity level. The training set for each data integrity level contains no less than 200 samples, covering the typical variation range of the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharide.

[0039] Furthermore, the training of each molecular weight range sensing encoder network adopts a supervised learning approach, and the loss function adopts the mean squared error loss. That is, the mean squared error between the molecular weight distribution curve reconstructed by the latent feature vector output by the molecular weight range sensing encoder network and the true molecular weight distribution curve is calculated, and the parameters of the molecular weight range sensing encoder network are optimized by the backpropagation algorithm.

[0040] Step 4: Fuse multi-source signals and extract latent feature vectors; The system determines whether the current data contains candidate segments of ultra-high molecular weight components based on the identifiers of these segments. If it does, the differential refractive index signal and the low-angle group scattering signal are normalized and preprocessed separately, then weighted and fused. The fused signal is then input into the selected molecular weight range sensing encoder network. If it does not contain these segments, only the differential refractive index signal is input into the molecular weight range sensing encoder network. The latent feature vector at the current moment is extracted through forward propagation calculations using the molecular weight range sensing encoder network.

[0041] For cases involving candidate regions containing ultra-high molecular weight components, the differential refractive index signal and the mean intensity of the low-angle scattered light are first preprocessed using a mean normalization method based on the range. This normalizes the two types of signals to the same numerical range, eliminating dimensional differences between different physical quantities. Then, the fusion signal is calculated. : in, Indicates the first The fused signal value at each sampling time point Indicates the first Normalized differential refractive signal values ​​at each sampling time point Indicates the first The mean of the low-angle scattered light intensity after normalization at each sampling time point. This represents the fusion weighting coefficient.

[0042] Furthermore, the average intensity of scattered light in the low-angle group The calculation method is as follows: calculate the first... The average scattered light intensity of all low-angle detectors at each sampling time point.

[0043] Furthermore, the fusion weighting coefficients The value ranges from 0 to 1, and is set based on prior knowledge of the contribution of differential refractive index signal and low-angle group scattering signal to molecular weight prediction. The typical value is 0.6 to 0.8.

[0044] Furthermore, multi-source signal fusion can comprehensively utilize the concentration information of differential refractive index signals and the molecular weight information of low-angle group scattering signals in the detection of ultra-high molecular weight components. The differential refractive index signal reflects the mass concentration distribution of the eluted components, while the low-angle group scattering signal maintains a high signal-to-noise ratio in the ultra-high molecular weight region and is sensitive to changes in molecular weight. The fusion of these two types of signals can provide a more complete feature representation and improve the ability of the molecular weight range sensing encoder network to identify ultra-high molecular weight components.

[0045] Step 5: Perform weighted fusion of the current latent feature vector and the historical latent feature vector to generate a time-smoothed latent feature vector; Extract the latent feature vector at the current moment The temporally smoothed latent feature vector from the previous time step Weighted fusion is performed, with the fusion weights dynamically adjusted based on the current data completeness level, to generate a time-series smoothed latent feature vector. .

[0046] The calculation method for the temporal smoothing latent feature vector is as follows: in, This represents the temporal smoothing latent feature vector at the current time. This represents the latent feature vector extracted at the current time. This represents the temporally smoothed latent feature vector from the previous time step. Indicates the level of data integrity The relevant fusion weight function satisfies the monotonically increasing property, that is, the higher the data completeness, the greater the weight of the current potential feature vector.

[0047] Furthermore, the fusion weight function Defined using linear functions: in, Indicates the level of data integrity. Indicates the number of completeness level classifications. and These are the lower and upper bounds of the weights, respectively. The value range is from 0.1 to 0.3. The value range is from 0.7 to 0.9.

[0048] Furthermore, temporal smoothing can smooth out fluctuations in prediction results in the early stages of data acquisition, avoiding prediction jumps caused by incomplete data. As the completeness of data increases, the weight of the potential feature vector at the current moment gradually increases, enabling the prediction results to reflect the information contained in the new data in a timely manner, achieving a balance between prediction stability and response sensitivity.

[0049] Furthermore, at the first sampling moment of the elution process, there is no historical latent feature vector. Therefore, the current latent feature vector is directly used as the temporal smoothing latent feature vector. ,in This represents the latent feature vector extracted at the first sampling time.

[0050] Step 6: Input the temporally smoothed latent feature vector into the dual-branch decoder network to generate the molecular weight distribution prediction curve; The temporally smoothed latent feature vector is input into the dual-branch decoder network, and the predicted molecular weight distribution curve for the low to medium molecular weight range is output through the regular branch. The predicted molecular weight distribution curves for the high molecular weight range are output through the ultra-high molecular weight branch. ,in Indicates molecular weight.

[0051] Furthermore, the molecular weight range boundary values , and Determined based on the typical molecular weight distribution characteristics of ginseng polysaccharides. The lower limit of molecular weight determination is set for a gel permeation chromatography-multi-angle laser light scattering detection system, with a typical value of [value missing]. Da; Set as the upper limit of molecular weight for ultra-high molecular weight components, typically valued at [value missing]. Da; The value is set as the boundary between the conventional molecular weight component and the ultra-high molecular weight component, and is taken as [value]. Da.

[0052] Furthermore, the dual-branch decoder network comprises a shared feature processing layer and two parallel dedicated decoding branches. The shared feature processing layer receives temporally smoothed latent feature vectors. The feature transformation is performed through a fully connected layer, and the intermediate feature representation is output. The regular branch and the ultra-high molecular weight branch each consist of multiple transposed convolutional layers, upsampling intermediate features into distribution curves for the corresponding molecular weight range. The output of the regular branch covers the entire molecular weight range. Distribution curve Ultra-high molecular weight branch output covers a wide molecular weight range Distribution curve .

[0053] Furthermore, the shared feature processing layer contains 1 to 2 fully connected layers, each with 128 to 512 neurons, and uses the ReLU activation function. The transposed convolutional layers consist of 3 to 5 layers, each with a kernel size of 3 to 5 and a stride of 2 to 3, and uses the ReLU activation function. The last layer uses the Sigmoid activation function to map the output value to the range of 0 to 1.

[0054] Furthermore, the dual-branch decoder network is trained using supervised learning. The input is the latent feature vector, and the output is the predicted molecular weight distribution curve. The training labels are the actual molecular weight distribution curves. The loss function is a piecewise weighted mean squared error loss, specifically calculated by separately calculating the predicted molecular weight distribution curve of the regular branch and the actual molecular weight distribution curve. Mean square error within the range And the predicted branching curves of ultra-high molecular weight molecules versus the actual molecular weight distribution curves. Mean square error within the range The total loss function is ,in Indicates the total loss. This represents the mean squared error loss of the regular branch. This represents the mean square error loss of the ultra-high molecular weight branch. and The loss weights for the two branches are used to optimize the parameters of the dual-branch decoder network through backpropagation.

[0055] Furthermore, the loss weighting coefficient and The value is set based on a balance of the proportion of sample sizes within two molecular weight ranges, with a typical value being [value missing]. , .

[0056] Furthermore, the dual-branch decoder network enables the conventional branch and the ultra-high molecular weight branch to learn the mapping relationship between elution signals and molecular weight distributions within different molecular weight ranges, respectively. Since the elution behavior and detection signal characteristics of ultra-high molecular weight components differ significantly from those of conventional molecular weight components, using independent decoding branches can avoid feature interference from samples within different molecular weight ranges, thereby improving the prediction accuracy for each molecular weight interval.

[0057] Step 7: Apply the Berry equation to fit the low-angle scattering data and correct the ultra-high molecular weight branch prediction curve; The Berry equation was applied to perform weighted least squares fitting on the low-angle group scattering data within the candidate region of ultra-high molecular weight components. The absolute molecular weight values ​​at each sampling time point were calculated. Based on the deviation between the absolute molecular weight values ​​and the predicted values ​​of the ultra-high molecular weight branch, a correction factor was calculated to correct the predicted curve output by the ultra-high molecular weight branch, and the corrected ultra-high molecular weight branch prediction curve was obtained. .

[0058] The fitted form of the Berry equation is as follows: in, Represents optical constants. Indicates solute concentration. The scattering angle is Rayleigh at that time Indicates the scattering angle. This represents the weight-average molecular weight. Indicates the solvent refractive index. Indicates the mean square radius of gyration. Indicates the wavelength of the incident light. It represents pi (π).

[0059] Furthermore, optical constants The calculation formula is: in, Represents pi (π). Indicates the solvent refractive index. This represents the increase in the refractive index of the solute. Represents Avogadro's constant. Indicates the wavelength of the incident light. Refractive index increment. The typical value of *Gynostemma pentaphyllum* polysaccharide in aqueous solution was obtained by offline determination of standard samples, ranging from 0.140 to 0.150 mL / g.

[0060] Furthermore, solute concentration It is calculated by dividing the differential refractive index signal value by the refractive index increment, i.e. ,in Indicates the differential refractive signal value. This represents the increment of the refractive index.

[0061] Furthermore, Rayleigh The value is calculated by combining the ratio of scattered light intensity to incident light intensity with the instrument calibration factor. The specific calculation method is automatically completed by the data processing software of the gel permeation chromatography coupled with the multi-angle laser light scattering detection system.

[0062] The weights of data points at each angle during weighted least squares fitting. It is proportional to the signal-to-noise ratio at that angle: in, Indicates the first Weights of each angle, Indicates the first Signal-to-noise ratio at each angle, This represents the set of detection angle indices for the low-angle group. This indicates the angle index used for summation in the lower angle group. Indicates the first Signal-to-noise ratio at each angle.

[0063] Furthermore, the Berry equation fitting was solved using the weighted least squares method, with the objective function being to minimize the weighted sum of squared residuals. ,in This represents the angle index in the lower angle group. Indicates the first Weights of each angle, Represents optical constants. Indicates solute concentration. The scattering angle is Rayleigh at that time This indicates that the right-hand side of Berry's equation has respect to the angle. The functional expression is given. The intercept and slope are obtained through linear regression; the reciprocal square of the intercept is the weight-average molecular weight. .

[0064] Correction factor The calculation method is as follows: in, Indicates the correction factor. This represents the weight-average molecular weight obtained by fitting the Berry equation. This represents the weight-average molecular weight corresponding to the ultra-high molecular weight branch prediction curve.

[0065] Furthermore, the corrected ultra-high molecular weight branching prediction curve is obtained through a scaling transformation of the molecular weight axis. The specific calculation method is as follows: For the ultra-high molecular weight branching prediction curve... Each molecular weight value in Map it to the corrected molecular weight value ,in This indicates the corrected molecular weight value. Indicates the correction factor. Representing the original molecular weight value while keeping the corresponding probability density constant, we obtain the corrected ultra-high molecular weight branch prediction curve. .

[0066] Furthermore, the Berry equation is used instead of the Zimm equation for fitting because the Berry equation has better linearity for molecules with large radii of rotation in the high-angle region, which can reduce extrapolation errors caused by signal attenuation at high angles. The Zimm equation uses the reciprocal of the scattered light intensity to linearly fit the angle function, while the Berry equation uses the square root of the reciprocal of the scattered light intensity. The Berry equation has a wider linear range and higher fitting stability when dealing with large molecules.

[0067] Furthermore, weighted least squares fitting assigns greater weight to low-angle scattering data with high signal-to-noise ratio, which can effectively suppress the influence of noise on the fitting results and improve the accuracy of the determination of the absolute molecular weight of ultra-high molecular weight components.

[0068] Furthermore, if the current data does not contain candidate segments of ultra-high molecular weight components, the correction process in this step is skipped, and the predicted curve output by the ultra-high molecular weight branch is directly used as the final ultra-high molecular weight branch prediction curve.

[0069] Step 8: Combine the conventional branching prediction curve with the corrected ultra-high molecular weight branching prediction curve, and output the molecular weight distribution analysis results; The prediction curve output by the regular branch Compared with the corrected ultra-high molecular weight branching prediction curve Smoothly stitching along the molecular weight axis generates a complete molecular weight distribution curve covering the entire molecular weight range. ; Calculate the number-average molecular weight based on the complete molecular weight distribution curve Weight-average molecular weight and the proportion of ultra-high molecular weight components Output the molecular weight distribution analysis results of ginseng polysaccharides from the beads.

[0070] The splicing area is set as follows: A linear interpolation algorithm is used within the splicing area to achieve a smooth transition between the two curves: in, This indicates the complete molecular weight distribution curve at molecular weight. The value at that location, This indicates the boundary value between conventional molecular weight components and ultra-high molecular weight components. This indicates the half-width of the splicing transition area. This indicates that the conventional branching prediction curve is at the molecular weight. The value at that location, This indicates that the corrected ultra-high molecular weight branching prediction curve is at the molecular weight... The value at that location.

[0071] Furthermore, the half width of the splicing transition area The range of values ​​is to Typical value .

[0072] Furthermore, outside the spliced ​​region, the complete molecular weight distribution curve directly adopts the predicted curve of the corresponding branch, i.e., when hour, ; when hour, .

[0073] The calculation methods for number-average molecular weight, weight-average molecular weight, and the percentage of ultra-high molecular weight components are as follows: in, This represents the number-average molecular weight. This represents the weight-average molecular weight. This indicates the percentage of ultra-high molecular weight components. This indicates the complete molecular weight distribution curve at molecular weight. The value at that location, Indicates the lower limit for molecular weight determination. This indicates the upper limit of the molecular weight of ultra-high molecular weight components. This indicates the boundary value between conventional molecular weight components and ultra-high molecular weight components.

[0074] Furthermore, the integral calculation is achieved using numerical integration methods such as the trapezoidal rule or Simpson's rule, which divide the molecular weight axis into several discrete intervals and calculate the approximate integral value.

[0075] Furthermore, the molecular weight distribution analysis results are output in each data update cycle, forming a real-time prediction sequence that is continuously updated as the data acquisition progresses. In the early stages of the elution process, the output results are labeled as predicted values; as the data completeness increases, the predicted values ​​gradually converge to a stable state; when the data completeness reaches a preset completeness threshold, the output results are labeled as measured values.

[0076] Furthermore, the preset completeness threshold ranges from 0.85 to 0.95, with a typical value of 0.90. That is, when the data completeness reaches 90%, the output result is converted from predicted value labeling to measured value labeling.

[0077] Furthermore, the output molecular weight distribution analysis results include complete molecular weight distribution curves, number-average molecular weight, weight-average molecular weight, polydispersity index, percentage of ultra-high molecular weight components, and data completeness level, presented to users in the form of data tables and graphical interfaces.

[0078] Furthermore, the polydispersity index is calculated as the ratio of weight-average molecular weight to number-average molecular weight. Right now ,in Indicates the polydispersity index. This represents the weight-average molecular weight. This represents the number-average molecular weight, used to characterize the width of the molecular weight distribution.

[0079] The real-time fitting method for the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides provided in this embodiment can automatically identify candidate segments of ultra-high molecular weight components with significant high-angle signal attenuation by performing signal-to-noise ratio analysis on multi-angle laser light scattering signals, thereby triggering a dedicated fitting strategy for these candidate segments. Because the large radius of gyration of ultra-high molecular weight components leads to a sharp attenuation of high-angle scattered light intensity, the noise error from high-angle data propagates to the extrapolation results when the traditional Zimm equation relies on multi-angle data points for linear fitting. This embodiment uses the Berry equation to perform weighted least-squares fitting on the low-angle scattering data. On the one hand, the Berry equation has better linearity for molecules with large radii of gyration; on the other hand, the weighting strategy gives greater fitting weight to the high signal-to-noise ratio low-angle scattering data, thus reducing the fitting error caused by high-angle signal attenuation.

[0080] This implementation establishes a molecular weight range-aware encoder group. Based on the data integrity level and the existence of candidate segments for ultra-high molecular weight components, the corresponding molecular weight range-aware encoder network is adaptively selected, enabling the system to extract effective latent feature representations at any stage of the elution data streaming acquisition. Each molecular weight range-aware encoder network is pre-trained on truncated data within a specific integrity range, learning the mapping relationship from incomplete input data to the complete molecular weight distribution. Therefore, it can output molecular weight distribution prediction results early in the data acquisition process, meeting the real-time requirements of online process monitoring.

[0081] The temporal smoothing mechanism introduced in this implementation dynamically adjusts the fusion weights of the current latent feature vector and the historical latent feature vector. This suppresses prediction fluctuations caused by incomplete data in the early stages of data acquisition and allows the prediction results to quickly respond to new information in the later stages, thus ensuring the continuity and stability of the prediction result sequence. The dual-branch decoder network makes the prediction processes for the conventional molecular weight range and the ultra-high molecular weight range independent of each other, avoiding mutual interference between sample features from different molecular weight ranges and improving the prediction accuracy for each molecular weight range.

[0082] The embodiments of the present invention have been described above. However, the embodiments are not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make more equivalent embodiments under the guidance of the present embodiments, and all of them are within the protection scope of the present embodiments.

[0083] A traditional Chinese medicine pharmaceutical company deployed a real-time quality monitoring system on its *Panax notoginseng* polysaccharide extraction production line. This line uses an enzymatic hydrolysis-alcohol precipitation process. During a production run in March 20XX, to monitor the quality of polysaccharide product batch PZS-20XX0315-A, it was necessary to obtain real-time molecular weight distribution information during GPC-MALLS testing. This batch of raw materials consisted of *Panax notoginseng* rhizomes from Yunnan province. After enzymatic hydrolysis at 65°C for 2.5 hours, the product underwent alcohol precipitation purification. Quality control requirements specified a weight-average molecular weight of [insert value here]. Arrival Within the Da range, ultra-high molecular weight components (molecular weight greater than 1000 molecular weight) The content of Da is not less than 25%. The detection system is equipped with laser light scattering detectors at 18 angles (15° to 165°), the sampling interval is set to 1 second, and the preset total elution time is 42 minutes.

[0084] The system continuously receives the streaming signal output from the GPC-MALLS instrument after elution begins. At the 18-minute mark of elution, the data buffer has accumulated differential refractive index signals from 1080 sampling points and corresponding 18 angular scattered light intensity data. At this point, the detection data from the 1081st sampling point is added and appended to the dual-channel buffer.

[0085] Table 1: Dual-channel detection signal data for the 1081st sampling point: At this point, the number of valid data points contained in the buffer dataset is: Time interval The cumulative detection time is 1080 seconds (18 minutes).

[0086] The system performs statistical analysis on the scattered light intensity data within a local time window (sampling points 1074 to 1088, a total of 15 points) around the 1081st sampling point, and calculates the average signal-to-noise ratio of the low-angle group and the high-angle group.

[0087] Figure 2 This shows the distribution of the differential refractive index signal and the multi-angle scattered light intensity signal at the 1081st sampling point.

[0088] Table 2. Signal-to-noise ratio analysis of the multi-angle scattering signal at the 1081st sampling point: Calculate the average signal-to-noise ratio for the high-angle group: Since the preset signal-to-noise ratio threshold is set to 5.0, the average signal-to-noise ratio of the high-angle group is... The conditions for determining candidate regions of ultra-high molecular weight components are met. The system detected that the conditions were met for 14 consecutive time points starting from the 1068th sampling point. Therefore, the sequence from time point 1068 to 1081 was marked as a candidate region of ultra-high molecular weight components, and the candidate region of ultra-high molecular weight components was identified as 1.

[0089] Figure 3 The signal-to-noise ratio distributions of the low-angle and high-angle groups were compared to identify ultra-high molecular weight components.

[0090] Based on the number of valid data points in the current buffer Fixed time interval seconds, preset total wash-off time Seconds (42 minutes), number of completeness level classifications Calculate the data integrity level: Combining the candidate segment identifier of ultra-high molecular weight components with 1, the system selects the encoder network with the corresponding index (completeness level 3, ultra-high molecular weight identifier 1) from the pre-trained molecular weight range sensing encoder group. This network is codenamed Encoder-L3-Ultra.

[0091] Table 3. Parameters for selecting molecular weight range sensing encoder networks: The Encoder-L3-Ultra network structure consists of four one-dimensional convolutional layers with kernel sizes of 5, 5, 3, and 3, a stride of 2, and the activation function ReLU. Finally, it outputs a 128-dimensional latent feature vector through a fully connected layer.

[0092] Since the candidate region for the ultra-high molecular weight component is identified as 1, the system needs to fuse the differential refractive index signal and the low-angle group scattering signal. First, the two types of signals are normalized, and then fused according to the fusion weighting coefficients. Calculate the fused signal.

[0093] Table 4: Signal fusion processing data for the 1081st sampling point: Calculation of the average intensity of scattered light in the low-angle group: Fusion signal calculation: The fused signal sequence containing 1081 sampling points is input into the Encoder-L3-Ultra network. After forward propagation, a 128-dimensional latent feature vector is extracted.

[0094] Figure 4 This demonstrates the data acquisition progress and the selection logic of the molecular weight range sensing encoder.

[0095] Figure 5 The normalization and fusion calculation of differential refractive signals and low-angle scattered signals is demonstrated.

[0096] The latent feature vector extracted at the current time step is weighted and fused with the temporally smoothed latent feature vector from the previous time step. This is based on the data completeness level. Calculate the fusion weights: Calculation of temporally smoothed latent feature vectors (using the first three dimensions as an example): Table 5. Temporal smoothing latent feature vector fusion process (partial dimensions): Example of calculation for dimension 1: The complete 128-dimensional temporally smoothed latent feature vector is obtained by calculating each dimension in the above manner.

[0097] The temporally smoothed latent feature vector is input into the dual-branch decoder network. The shared feature processing layer maps the 128-dimensional input to a 256-dimensional intermediate feature representation, which is then input into the regular branch and the ultra-high molecular weight branch, respectively.

[0098] The standard branch output covers a molecular weight range The predicted distribution curve of Da, with the ultra-high molecular weight branch output covering the molecular weight range. The predicted distribution curve of Da.

[0099] Table 6. Prediction distribution curves (sampling points) of the dual-branch decoder network output: The output curves of the two branches at molecular weight There are overlapping areas near Da, which need to be spliced ​​in subsequent steps.

[0100] The Berry equation was applied to fit the low-angle group scattering data within the candidate region of ultra-high molecular weight components (time points 1068 to 1081). Detailed calculations were performed on the 1075th sampling point (a representative point with strong scattered light intensity in this region).

[0101] Table 7 shows the input data for the Berry equation fitting at the 1075th sampling point: Concentration Calculated using differential refractive index signals: mg / mL, refractive index increment mL / g.

[0102] Calculation of optical constants: Weighted least squares fitting was performed on the 8 low-angle data points, and the intercept of the fitted line was... mol / g Calculate the weight-average molecular weight obtained by fitting the Berry equation: Weight-average molecular weight corresponding to the ultra-high molecular weight branching prediction curve Da, calculate the correction factor: The molecular weight axis of the ultra-high molecular weight branching prediction curve is adjusted according to the correction factor. Scaling is performed to obtain the corrected ultra-high molecular weight branch prediction curve.

[0103] Table 8 Comparison of ultra-high molecular weight branching prediction curves before and after correction (sampling points): Set the splicing area as Da, half width of the transition area Da. Linear interpolation is used to smoothly transition the two curves within the splicing area.

[0104] Table 9. Results of splicing complete molecular weight distribution curves (key sampling points): Calculate molecular weight distribution characteristic parameters based on the complete molecular weight distribution curve: Number average molecular weight: Weight-average molecular weight: Polydispersion index: Percentage of ultra-high molecular weight components: The system outputs the above molecular weight distribution analysis results at the 18th minute of elution (data integrity 42.9%), labeled as predicted values, and displays the data integrity level as 3 on the interface.

[0105] Figure 6 This demonstrates the distribution of low-angle scattered light intensity as a function of angle, which can be used for absolute molecular weight correction.

[0106] Figure 7 The molecular weight distribution prediction results of the conventional branch and the ultra-high molecular weight branch are displayed.

[0107] Throughout the process, the data undergoes multiple transformations from the raw detection signal to ultimately form the molecular weight distribution analysis results. First, the GPC-MALLS instrument continuously outputs differential refractive index signals and scattered light intensity signals at 1-second intervals. These flow cytometry signals are appended to a dual-channel buffer to form a cumulative dataset. At the 18-minute mark, the buffer contains complete detection data from 1081 sampling points.

[0108] The system performs signal-to-noise ratio analysis on multi-angle scattering signals. By calculating the mean and standard deviation within a local time window, it identifies that the average signal-to-noise ratio of the high-angle group is 4.3, which is lower than the preset threshold of 5.0. Therefore, it determines that there is a candidate segment of ultra-high molecular weight component and generates a label value of 1.

[0109] Based on the current data integrity of 42.9%, a data integrity level of 3 is calculated. Combined with the ultra-high molecular weight component identifier 1, the system selects the corresponding Encoder-L3-Ultra encoder network. Due to the presence of ultra-high molecular weight components, the system normalizes the differential refractive signal (0.0428 V) and the low-angle group scattering mean (1.947 mV) and fuses them into a fused signal (0.611) with a weighted ratio of 7:3. This fused signal is then input into the encoder network to extract a 128-dimensional latent feature vector.

[0110] The latent feature vector is temporally fused with historical smoothed features at a current weight of 0.425 to generate a temporally smoothed latent feature vector, which is then input into a dual-branch decoder network. The regular branch outputs prediction curves for the low to medium molecular weight range, while the ultra-high molecular weight branch outputs prediction curves for the high molecular weight range.

[0111] The absolute weight-average molecular weight was calculated by fitting the low-angle scattering data of the candidate regions of ultra-high molecular weight components using the Berry equation. Da, and the predicted value of ultra-high molecular weight branching The comparison yielded a correction factor of 1.083, which was used to scale and correct the molecular weight axis of the ultra-high molecular weight branching prediction curve. Finally, the conventional branching prediction curve and the corrected ultra-high molecular weight branching prediction curve were compared. Linear interpolation is performed on the Da region to form a coverage. The complete range of molecular weight distribution curves was obtained, and the number-average molecular weight was calculated. Da, weight-average molecular weight Key quality indicators such as Da, polydispersity index 1.96, and ultra-high molecular weight component content 28.7% enable real-time prediction of molecular weight distribution during data acquisition.

Claims

1. A method for intelligently fitting the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides, characterized in that, Includes the following steps: Receive the flow cytometry detection signal output by the gel permeation chromatography coupled with the multi-angle laser light scattering detection system, and append the differential refractive index signal and the multi-angle laser light scattering signal to the dual-channel data buffer respectively to obtain the current buffer data set; For each sampling time point, the average signal-to-noise ratio of the low-angle group scattering signal and the high-angle group scattering signal are calculated respectively. The continuous time point sequence with the average signal-to-noise ratio of the high-angle group being lower than the preset signal-to-noise ratio threshold is identified and marked as candidate segments of ultra-high molecular weight components. Calculate the ratio of the number of valid data points to the number of theoretical data points in the current buffer, determine the data integrity level, and select the corresponding molecular weight range sensing encoder network from the pre-trained molecular weight range sensing encoder group based on the data integrity level and the candidate segment identifier of ultra-high molecular weight components. Based on the candidate segment identifiers of ultra-high molecular weight components, differential refractive signals and low-angle group scattering signals are fused. The fused signal is then input into the selected molecular weight range sensing encoder network to extract potential feature vectors. The current latent feature vector is weighted and fused with the historical time-series smoothed latent feature vector. The fusion weight is dynamically adjusted according to the data completeness level to generate the current time-series smoothed latent feature vector. The temporally smoothed latent feature vector is input into the dual-branch decoder network, which outputs the conventional branch prediction curve and the ultra-high molecular weight branch prediction curve, respectively. The Berry equation was applied to the low-angle group scattering data in the candidate region of ultra-high molecular weight components for weighted least squares fitting, and the correction factor was calculated to correct the ultra-high molecular weight branch prediction curve. The conventional branching prediction curve and the corrected ultra-high molecular weight branching prediction curve are smoothly spliced ​​together to output the molecular weight distribution analysis results.

2. The intelligent fitting method for the molecular weight distribution of *Panax beadica* polysaccharides according to claim 1, characterized in that, The calculation of the average signal-to-noise ratio of the low-angle group scattered signals and the high-angle group scattered signals includes: The low-angle group includes detector channels that detect angles less than a preset angle boundary value, and the high-angle group includes detector channels that detect angles greater than or equal to a preset angle boundary value. For each sampling time point, calculate the mean and standard deviation of the scattered light intensity at each detection angle within the local time window centered on that sampling time point. Use the ratio of the mean to the standard deviation as the signal-to-noise ratio for that angle. Average the signal-to-noise ratios of each angle in the high-angle group to obtain the average signal-to-noise ratio of the high-angle group. The width of the local time window is 5 to 15 consecutive sampling points, and the preset signal-to-noise ratio threshold ranges from 3 to 8.

3. The intelligent fitting method for the molecular weight distribution of *Panax beadica* polysaccharides according to claim 1, characterized in that, The methods for determining the data integrity level include: The product of the number of valid data points and the fixed time interval is divided by the preset total rinsing time, and then multiplied by the number of integrity levels and rounded down to obtain the data integrity level. The number of integrity level classifications ranges from 5 to 10; the molecular weight range sensing encoder group is organized using a two-dimensional index structure, with the first dimension being the data integrity level index and the second dimension being the existence identifier of candidate segments of ultra-high molecular weight components.

4. The intelligent fitting method for the molecular weight distribution of bead ginseng polysaccharides according to claim 1, characterized in that, The molecular weight range sensing encoder network adopts a one-dimensional convolutional neural network structure, which includes multiple convolutional layers and pooling layers, and outputs a fixed-dimensional latent feature vector through a fully connected layer. The training dataset for each molecular weight range sensing encoder network consists of truncated elution curves corresponding to the integrity interval, and the training label is the complete molecular weight distribution curve of the corresponding sample. Supervised learning is used for training.

5. The intelligent fitting method for the molecular weight distribution of *Panax beadica* polysaccharides according to claim 1, characterized in that, The step of fusing the differential refractive index signal and the low-angle group scattering signal based on the candidate segment identifier of the ultra-high molecular weight component includes: If the candidate region of ultra-high molecular weight component is included, the differential refractive index signal and the mean value of the scattered light intensity of the low angle group are normalized and preprocessed before being weighted and fused. The value of the fusion weight coefficient ranges from 0.6 to 0.

8. If the candidate region for ultra-high molecular weight components is not included, then only the differential refractive signal is input to the molecular weight range sensing encoder network.

6. The intelligent fitting method for the molecular weight distribution of bead ginseng polysaccharides according to claim 1, characterized in that, The fusion weight is dynamically adjusted based on the data integrity level, including: The fusion weights are defined using a linear function. The higher the data completeness level, the greater the weight of the current latent feature vector. The lower bound of the weights ranges from 0.1 to 0.3, and the upper bound of the weights ranges from 0.7 to 0.

9. At the first sampling moment of the elution process, the current latent feature vector is directly used as the temporal smoothing latent feature vector.

7. The intelligent fitting method for the molecular weight distribution of *Panax beadica* polysaccharides according to claim 1, characterized in that, The dual-branch decoder network includes: A shared feature processing layer receives temporally smoothed latent feature vectors and outputs intermediate feature representations through a fully connected layer; The regular branch consists of multiple transposed convolutional layers, and the output is a regular branch prediction curve covering the low to medium molecular weight range. The ultra-high molecular weight branch, composed of multiple transposed convolutional layers, outputs an ultra-high molecular weight branch prediction curve covering the high molecular weight range; the dual-branch decoder network is trained using piecewise weighted mean square error loss.

8. The intelligent fitting method for the molecular weight distribution of bead ginseng polysaccharides according to claim 1, characterized in that, The application of the Berry equation for weighted least squares fitting includes: The weight of each data point at each angle is proportional to the signal-to-noise ratio of that angle; the higher the signal-to-noise ratio, the greater the weight. The intercept and slope of the Berry equation are obtained by linear regression. The reciprocal square of the intercept is the weight-average molecular weight. The correction factor is the ratio of the weight-average molecular weight obtained by fitting the Berry equation to the weight-average molecular weight corresponding to the ultra-high molecular weight branch prediction curve. The corrected ultra-high molecular weight branch prediction curve was obtained by scaling the molecular weight axis based on the correction factor.

9. The intelligent fitting method for the molecular weight distribution of bead ginseng polysaccharides according to claim 1, characterized in that, The step of smoothly stitching the conventional branching prediction curve with the corrected ultra-high molecular weight branching prediction curve includes: A splicing transition region is set near the boundary value between the conventional molecular weight component and the ultra-high molecular weight component, and the half width of the splicing transition region is 0.1 to 0.3 times the boundary value; A linear interpolation algorithm is used in the splicing transition area to achieve a smooth transition between the two curves; The molecular weight distribution analysis results include number-average molecular weight, weight-average molecular weight, polydispersity index, and the proportion of ultra-high molecular weight components. When the data integrity reaches a preset integrity threshold, the output result is labeled as the measured value. The preset integrity threshold ranges from 0.85 to 0.

95.

10. A smart fitting system for the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides, used to execute the smart fitting method for the molecular weight distribution of *Gynostemma pentaphyllum* polysaccharides according to any one of claims 1 to 9, characterized in that, include: The signal receiving and buffering module is used to receive the flow cytometry detection signal output by the gel permeation chromatography coupled with the multi-angle laser light scattering detection system, and append the differential refractive index signal and the multi-angle laser light scattering signal to the dual-channel data buffer respectively; the ultra-high molecular weight component identification module is used to calculate the average signal-to-noise ratio of the low-angle group and the high-angle group at each sampling time point, and to identify and mark the candidate segments of ultra-high molecular weight components. The encoder selection module is used to calculate the data integrity level and select the corresponding molecular range sensing encoder network from the molecular range sensing encoder group based on the data integrity level and the candidate segment identifier of the ultra-high molecular weight component. The feature extraction module is used to perform signal fusion processing based on the candidate segment identifiers of ultra-high molecular weight components and extract potential feature vectors through the selected molecular weight range sensing encoder network. The temporal smoothing module is used to weight and fuse the current latent feature vector with the historical temporal smoothing latent feature vector to generate the current temporal smoothing latent feature vector; The dual-branch decoding module is used to input the temporally smoothed latent feature vector into the dual-branch decoder network and output the conventional branch prediction curve and the ultra-high molecular weight branch prediction curve, respectively. The Berry equation correction module is used to apply the Berry equation to the low-angle group scattering data in the candidate region of ultra-high molecular weight components, perform weighted least squares fitting, calculate the correction factor, and correct the ultra-high molecular weight branch prediction curve. The results output module is used to smoothly stitch together the conventional branch prediction curve and the corrected ultra-high molecular weight branch prediction curve to output the molecular weight distribution analysis results.