A method and system for detecting fish lysate protein content based on near-infrared spectroscopy
By partitioning and enhancing the characteristic peaks of the near-infrared spectrum of fish lysate, and combining dynamic spectral decoupling and an adaptive robust regression model, the accuracy and stability issues of fish lysate protein content detection in existing technologies have been resolved, achieving high-precision and robust detection results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUDI DINGMU BIOTECHNOLOGY CO LTD
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-30
AI Technical Summary
Existing near-infrared spectroscopy detection technology suffers from problems such as low modeling accuracy, weak signal decoupling ability, insufficient stability, and uneven detection across the entire concentration range in fish lysate protein content detection, making it difficult to meet the high precision and robustness requirements of industrial sites.
A method for detecting fish lysate protein content based on near-infrared spectroscopy is adopted. By performing sample partitioning and feature identification, including data acquisition, partitioning correction, characteristic peak enhancement, dynamic spectral decoupling, and adaptive robust regression model combination processing, a dynamic spectral decoupling model and an adaptive robust regression model are constructed to achieve high-precision detection of fish lysate protein content.
It improves the accuracy and stability of fish lysate protein content detection, enhances signal identification, adapts to complex environmental changes in industrial sites, and achieves high robustness detection across the entire concentration range.
Smart Images

Figure CN122306746A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of fish lysate protein content detection, and in particular to a method and system for detecting fish lysate protein content based on near-infrared spectroscopy. Background Technology
[0002] Fish solubles are widely used in aquatic feed, pet food, bio-fermentation, and functional protein products. Protein content is a core indicator for evaluating the quality, grade, pricing, and formulation of fish solubles. In the continuous industrial production of fish solubles, rapid, accurate, stable, and interpretable online detection and process control of protein content are key technologies for ensuring product quality uniformity, improving production efficiency, and reducing production costs.
[0003] Current methods for detecting fish lysate protein content are mainly divided into two categories: traditional chemical detection methods and rapid near-infrared spectroscopy detection methods. Traditional detection uses the Kjeldahl method as the standard reference method, providing accurate and reliable results, but it suffers from drawbacks such as cumbersome operation procedures, complex sample pretreatment, and consumption of large amounts of chemical reagents. Near-infrared spectroscopy offers advantages such as fast detection speed, non-destructive testing, no pollution, and online analysis capability. However, existing near-infrared spectroscopy detection techniques and modeling methods still fall short of meeting the practical application requirements of high precision, high robustness, and high generalization, specifically in the following aspects:
[0004] (1) Existing near-infrared spectral preprocessing mostly adopts uniform calibration of the whole spectrum, without differentiating the characteristic peaks of fish lysate proteins. When removing scattering and baseline interference, it is easy to weaken and distort weak protein characteristic peaks, resulting in loss of key information and reducing modeling accuracy.
[0005] (2) The absorption peaks of protein, water, fat and other components in fish lysate spectrum are highly overlapping. Existing modeling methods have not effectively decoupled protein from interference signals, which easily introduces irrelevant covariates, resulting in poor model interpretability and weak generalization ability.
[0006] (3) Existing deep regression networks are mostly single-path and lack adaptive compensation mechanisms. When faced with production batches, sample loading, environment and abnormal fluctuations, the prediction bias is large and the stability is insufficient, making it difficult to adapt to industrial sites.
[0007] (4) The training only aims to minimize the error, does not follow the law of additive spectral components, and lacks physical constraints; and does not focus on optimizing the low content range, has poor robustness to abnormal samples, and the full range fitting is unbalanced, which cannot meet the requirements of accurate detection of full concentration. Summary of the Invention
[0008] The purpose of this invention is to solve the problems mentioned in the background section.
[0009] To achieve the above objectives, the present invention provides the following technical solution:
[0010] A method for detecting fish lysate protein content based on near-infrared spectroscopy, the method comprising:
[0011] Near-infrared spectral data of fish lysate samples were obtained, and a sample dataset was constructed based on the near-infrared spectral data and the corresponding protein content labels.
[0012] Based on prior information about protein characteristic peaks, the near-infrared spectral data is subjected to partitioned scattering correction and characteristic peak enhancement processing to obtain an enhanced near-infrared spectral vector.
[0013] The enhanced near-infrared spectral vector is input into a pre-constructed dynamic spectral decoupling model. Based on the component basis vector and component weight generation mechanism, the enhanced near-infrared spectral vector is decomposed into components to obtain the protein component contribution spectral vector.
[0014] The protein component contribution spectral vector is input into the deep feature extraction and adaptive robust regression model to output the predicted protein content of the fish lysate sample.
[0015] Based on component consistency constraints and robust loss functions, the dynamic spectral decoupling model and the adaptive robust regression model are jointly trained and optimized to obtain a fish lysate protein content detection model.
[0016] Preferably, the near-infrared spectrum is divided into protein characteristic peak range and non-characteristic range based on the wavelength range of protein absorption peaks;
[0017] Nonlinear scattering correction is performed within the protein characteristic peak range, and local scattering correction and local baseline subtraction are constructed based on local neighborhood differences;
[0018] Robust scattering correction is performed within the non-characteristic region to remove multiplicative scattering and additive baseline drift;
[0019] An adaptive sharpening enhancement process based on local contrast and peak height is performed on the protein characteristic peak regions to obtain an enhanced near-infrared spectral vector.
[0020] Preferably, the dynamic spectral decoupling model includes: a component basis vector set construction module, used to construct protein component basis vectors and multiple interfering component basis vectors;
[0021] The component weight generation module is used to generate component weight vectors based on enhanced near-infrared spectral vectors.
[0022] The spectral reconstruction module is used to reconstruct the component combination spectrum based on the component basis vector set and component weight vector;
[0023] The protein component extraction module is used to combine protein component weights and gating vectors to modulate the protein component basis vectors and generate protein component contribution spectral vectors.
[0024] Preferably, the component weight generation module includes:
[0025] The enhanced near-infrared spectral vector is feature-mapped to generate a hidden feature representation;
[0026] The latent feature representation is mapped to component response values, and then normalized using the Softmax function to obtain the contribution weight of each component.
[0027] The component weights satisfy the constraint that they are non-negative and sum to 1.
[0028] Preferably, the protein component extraction module includes:
[0029] Based on the enhanced near-infrared spectral vector, a gated branch is constructed to generate a gated vector along the wavelength dimension;
[0030] The gating vector and the protein component basis vector are modulated element-wise, and combined with the protein component weights to generate the protein component contribution spectral vector.
[0031] Preferably, the deep feature extraction and adaptive robust regression model includes:
[0032] A one-dimensional convolutional feature extraction network is used to extract deep features of the spectral vectors contributed by protein components;
[0033] The basic regression branch is used to output the basic predicted value of protein content;
[0034] An adaptive correction branch is used to generate the predicted correction term;
[0035] The base prediction value is added to the correction term to obtain the final protein content prediction value.
[0036] Preferably, the joint training optimization includes:
[0037] A spectral reconstruction error loss is constructed to constrain the difference between the component reconstructed spectrum and the enhanced near-infrared spectrum;
[0038] Construct a component weight constraint loss to constrain the sparsity or concentration of the component weight distribution;
[0039] Construct a robust regression loss function to reduce the impact of outliers on model training;
[0040] A weighting mechanism is introduced for samples with low protein content to improve the model's prediction accuracy in the low content range.
[0041] A fish lysate protein content detection system based on near-infrared spectroscopy, comprising:
[0042] The data acquisition module is used to acquire near-infrared spectral data of fish lysate samples;
[0043] The spectral processing module is used to perform partitioned scattering correction and characteristic peak enhancement processing on the near-infrared spectral data;
[0044] The spectral decoupling module is used to generate protein component contribution spectral vectors based on a dynamic spectral decoupling model.
[0045] The prediction module is used to output predicted protein content values based on deep feature extraction and an adaptive robust regression model.
[0046] The model training module is used to train and optimize the model based on component consistency constraints.
[0047] An electronic device includes a processor and a memory, the memory storing a computer program that, when executed by the processor, causes the processor to perform a method for detecting fish lysate protein content.
[0048] A computer-readable storage medium storing a computer program that, when executed by a processor, implements a method for detecting fish lysate protein content.
[0049] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0050] (1) In view of the practical problems of baseline drift in the near-infrared spectrum of fish lysate and the weak protein characteristic peaks that are easily masked, this invention proposes a protein characteristic peak-guided partitioned spectral correction and enhancement strategy. Based on the prior knowledge of the absorption peaks of fish lysate proteins, the system divides the region into characteristic and non-characteristic regions. In the region of protein characteristic peaks, local morphology adaptive nonlinear scattering correction and peak sharpening are used, while in the non-characteristic region, robust scattering correction is used. This effectively preserves and enhances the key spectral features of fish lysate proteins and improves signal recognition.
[0051] (2) This invention addresses the industrial problem of highly overlapping absorption peaks and strong signal coupling interference among components such as protein, water, and fat in fish slurry. It constructs a dynamic spectral decoupling module, which is based on learnable protein and interference component basis vectors. Combined with dynamic weight generation and wavelength-level gating modulation, the mixed spectrum is precisely decoupled into a pure protein component contribution spectrum, eliminating irrelevant component interference from the source, and making the detection results more consistent with the actual composition of fish slurry.
[0052] (3) In view of the pain points of on-site detection such as large batch differences in fish slurry production, uneven sample loading, sampling errors and frequent environmental fluctuations, the present invention constructs a dual-path adaptive robust regression network. In addition to the main feature extraction and basic regression, an adaptive correction branch is added, which can compensate for sample distribution shift and abnormal fluctuations in real time, and significantly improve the prediction stability and repeatability accuracy of the model in the complex industrial scenario of fish slurry.
[0053] (4) In view of the actual needs of fish sol samples with low protein content being difficult to fit, outlier samples having a large impact, and uneven detection across the entire range, this invention establishes a component consistency training mechanism that includes spectral reconstruction and component weight concentration constraints. Combined with Huber robust loss and low-value sample weighting strategy, the model training conforms to the physical law of near-infrared component additivity, and comprehensively improves the robustness and accuracy of fish sol detection across the entire concentration range. Attached Figure Description
[0054] Figure 1 This is a flowchart of the method of the present invention;
[0055] Figure 2 This is a scatter plot showing the prediction performance of the present invention and conventional methods at the sample level.
[0056] Figure 3 This is a visualization experiment of the dynamic spectral decoupling effect of the present invention. Detailed Implementation
[0057] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0058] This invention provides a technical solution:
[0059] A method and system for detecting fish lysate protein content based on near-infrared spectroscopy, the main contents of which are as follows:
[0060] S1. Near-infrared spectral data acquisition and dataset construction of fish lysate samples
[0061] To achieve near-infrared spectroscopy-based detection of fish lysate protein content, a representative and accurately labeled dataset is first required. The data acquisition process involves selecting fish lysate samples from multiple production batches, with varying degrees of hydrolysis and protein content ranges, to ensure that subsequent modeling covers various spectral variations that may occur in practical applications. Each fish lysate sample is thoroughly mixed under isothermal conditions and then transferred to a quartz cuvette or sample cup for spectral acquisition using a Fourier transform near-infrared spectrometer or a grating-type near-infrared spectrometer. The spectral acquisition parameters are set to maintain the same wavelength range, resolution, and number of scans as used in the model application described below. Typically, the wavelength range covers the near-infrared region, the resolution is set to an appropriate value based on instrument performance, and the number of scans is averaged multiple times to reduce random noise. Near-infrared spectra of each sample are repeatedly acquired under the same ambient temperature and humidity conditions, and the average spectrum is used as the original near-infrared spectral vector for that sample to minimize the impact of sample loading errors and random instrument fluctuations.
[0062] After spectral acquisition, the Kjeldahl method was used to determine the true protein content of each fish lysate sample using a standard chemical method. This value was then used as the labeling value for that sample, categorized as a continuous numerical variable and expressed as a mass percentage, representing the actual proportion of protein in the sample. The original near-infrared spectral vector corresponding to each fish lysate sample was paired with the true protein content value to form a complete labeled sample.
[0063] All samples are divided into training set and validation set according to a preset ratio. The training set is used for parameter updates during model training, and the validation set is used for monitoring the model training status and adjusting hyperparameters.
[0064] S2. Scattering correction and characteristic peak enhancement of near-infrared spectra of fish lysate samples
[0065] Near-infrared spectra of fish slurry samples are easily affected by particle distribution, moisture state, uniformity, and differences in suspended particle size. In protein content modeling tasks, problems such as significant baseline drift, local peak broadening, and protein characteristic peaks being masked by water and fat absorption peaks are likely to occur. Conventional standard normal variable transformation and multivariate scattering correction usually process the entire spectral range according to uniform rules, making it difficult to simultaneously take into account the fine protection of characteristic peak regions and the robust descattering of non-characteristic regions.
[0066] This invention first locates the characteristic peak range of fish lysate proteins based on prior knowledge of their absorption peaks. Then, it performs nonlinear scattering correction and peak sharpening enhancement within the characteristic peak range, and robust scattering correction within the non-characteristic range. The final result is an enhanced near-infrared spectral vector that exhibits both good baseline stability and highlights the differences in the characteristic peaks of the proteins. The specific steps are as follows:
[0067] S201, Nonlinear scattering correction guided by protein characteristic peaks
[0068] Traditional full-spectrum uniform correction methods tend to weaken local morphological changes of protein characteristic peaks while removing scattering. This invention employs different correction strategies for protein characteristic peak regions and non-characteristic regions, thus preserving the weak peak shape information in the protein characteristic peak regions. The specific steps are as follows:
[0069] 1) Obtain the raw near-infrared spectral vector of the fish lysate sample. , representing the absorbance sequence of a single fish lysate sample across the entire wavelength range. Each element in the array corresponds to the absorbance value at a sampling wavelength point, and the length is... , which represents the number of sampling wavelength points.
[0070] For the original near-infrared spectral vector Perform bad pixel removal, saturation pixel removal, and instrument noise smoothing. The smoothing process can be any of the following: moving average, median filtering, or Savitzky-Golay smoothing.
[0071] 2) Based on the prior absorption peak information of fish lysate proteins, the original near-infrared spectral vectors are... Divided into There are one characteristic peak region and one non-characteristic region for the protein, among which... The number of characteristic peak regions of a protein is defined as follows: Indicates the first Define the characteristic peak range of a protein. The non-characteristic region is the set of wavelength points other than the characteristic peak regions of all proteins.
[0072] The protein characteristic peak ranges are set around approximately 2.05 μm and 2.18 μm, and extended to both sides by 10 nm to 40 nm according to the wavelength resolution of the spectrometer to cover the peak apex and shoulder regions; when the spectrometer outputs discrete wavelength points, the above wavelength ranges are converted into the corresponding index ranges.
[0073] 2.1) Within each protein characteristic peak interval, calculate the interval mean, interval standard deviation, interval baseline trend, and local neighborhood difference feature. The interval mean is used to characterize the overall absorbance level of the protein characteristic peak interval, the interval standard deviation is used to characterize the fluctuation of the protein characteristic peak interval, the interval baseline trend is used to describe whether the protein characteristic peak interval has an overall increase or decrease drift, and the local neighborhood difference feature is used to characterize the morphological change relationship between the current wavelength point and the adjacent wavelength points before and after.
[0074] For each wavelength point within the protein characteristic peak range, a local scattering correction and a local baseline subtraction are constructed, wherein:
[0075] The local scattering correction is obtained by weighted difference between the current wavelength point and the adjacent wavelength points. The weight of the adjacent wavelength points is greater and the weight of the wavelength points far away from the current wavelength points is smaller, thus highlighting the true changes in the local peak shape. The local baseline subtraction is obtained by weighted average over a large neighborhood and is used to estimate the local slowly changing background. Based on this, the current wavelength point will be differentiated according to the degree of local morphological anomaly and the slowly changing baseline drift will be removed simultaneously.
[0076] In practical implementation, for protein characteristic peak ranges For each wavelength point within the range, the local scattering correction is first dynamically calculated based on its morphological relationship with adjacent wavelength points. Then, selections are made on both the left and right sides of the current wavelength point. A small neighborhood is formed by two adjacent wavelength points to evaluate the local peak shape symmetry and slope variation at that point; simultaneously, a larger range is selected on both sides, for example, an interval length of [missing information]. to The wavelength points form a large neighborhood, used to estimate the locally varying background baseline. The difference in absorbance between the left and right sides within the small neighborhood determines whether the point is affected by nonlinear scattering: if the current wavelength point deviates significantly from the interval mean and the changes between the left and right sides within the small neighborhood are asymmetrical, the local scattering correction intensity is increased; if the current wavelength point is in a peak-stable region, the correction intensity is decreased. Finally, the original absorbance of the current wavelength point is subtracted from the local baseline deduction obtained from the large neighborhood, and its correction magnitude relative to the local mean is adjusted according to the local scattering correction intensity, thus obtaining the absorbance value of the wavelength point after nonlinear scattering correction, i.e., the wavelength point within the protein characteristic peak interval after nonlinear scattering correction.
[0077] 2.2) For non-characteristic intervals For wavelength points within the training set, robust scattering correction is performed. Robust scattering correction can employ any of the following methods: multivariate scattering correction, low-order polynomial baseline correction, or iterative weighted baseline correction. In a convenient implementation approach, the mean of multiple representative spectra in the training set is first used as the reference spectrum, and then the non-characteristic regions are... Perform multivariate scattering correction to remove global multiplicative scattering and additive shift.
[0078] In practical implementation, for non-feature intervals First, multiple representative spectra from the training set (such as the average spectrum of all training samples) are used as reference spectra, and then... Perform multivariate scattering correction on the spectrum of each sample; then apply multivariate scattering correction to the current sample. Linear regression was performed between the absorbance at each wavelength point within the sample and the absorbance at the corresponding wavelength point of the reference spectrum to estimate the multiplicative coefficients and additive offset. Then, the sample was subjected to... The absorbance at each wavelength point within the region is reduced by the estimated additive offset and divided by the multiplicative coefficient to obtain the robustly scattering corrected absorbance value, which is the wavelength point within the non-characteristic region Gother after robust scattering correction. This process removes multiplicative scattering and additive baseline drift in the non-characteristic region as a whole.
[0079] 3) The wavelength points within the protein characteristic peak range, after nonlinear scattering correction, are compared with the non-characteristic range. The wavelength points after robust scattering correction are reassembled in the original wavelength order to obtain the scattering-corrected near-infrared spectral vector. Scattering-corrected near-infrared spectral vector Compared with the original near-infrared spectral vector Having the same length However, it has the characteristics of more stable baseline, more realistic local peak shape, and weaker scattering interference.
[0080] Fish lysate protein characteristic peaks are often weak and easily distorted by scattering. If a global smoothing correction is directly applied, the useful peak shape will be weakened as well. This invention does not apply a uniform scattering correction to the entire spectrum. Instead, it first uses prior knowledge of protein characteristic peaks to locate key intervals, then performs local nonlinear correction on the protein characteristic peak intervals, and performs robust correction on non-characteristic intervals. Through partitioned correction, better scattering suppression effect and higher protein peak fidelity can be obtained at the same time.
[0081] S202, peak sharpening enhancement in the protein characteristic peak region
[0082] Even after scattering correction, the differences in protein characteristic peaks between fish lysate samples may still be insufficiently apparent due to their large peak widths and blurred peak shoulders.
[0083] This invention further enhances peak sharpness within the characteristic peak range of the protein, making the differences in peak height and peak width between samples with different protein contents clearer. The specific steps are as follows:
[0084] 1) Correcting near-infrared spectral vectors using scattering As input, local contrast is calculated only for wavelengths within the characteristic peak range of the protein. Local contrast represents the difference between the current wavelength and the weighted average absorbance within its surrounding window, used to characterize whether the point is located at a local peak apex, shoulder, or flat region. Specifically,
[0085] For protein characteristic peak regions The current wavelength point within is set to a length of [length value]. Local window ( (5 to 15 can be selected). Each wavelength point within the window is assigned a different weight according to its distance from the current wavelength point. The closer the distance, the greater the weight. The local weighted average absorbance is calculated. The difference between the absorbance of the current wavelength point and the local weighted average absorbance is the local contrast, which is used to quantify the degree of protrusion or depression of the wavelength point relative to the surrounding spectral morphology.
[0086] 2) Based on the obtained local contrast, the sharpening intensity is further determined according to the position of the current wavelength point in the peak shape.
[0087] First, determine the minimum absorbance value within the characteristic peak range of the protein to assess the local peak height range. When the current wavelength point is close to the peak and significantly higher than the local background, the local contrast is large, so increase the sharpening intensity. When the current wavelength point is close to the valley or a flat background, the local contrast is small, so decrease the sharpening intensity to avoid amplifying the background noise.
[0088] 3) Combine the local contrast with the peak height-related enhancement coefficient to obtain the sharpening enhancement result of the current wavelength point; to prevent over-enhancement, an upper limit can be set for the enhancement amplitude, for example, limiting the enhancement amount to no more than 5% to 20% of the dynamic range of the characteristic peak interval of the protein.
[0089] The peak height-related enhancement coefficient is dynamically generated based on the relative height of the current wavelength point within the characteristic peak interval. For example, the enhancement coefficient can be defined... It is a function of the ratio of the absorbance at the current wavelength point to the minimum absorbance (or local background) in the interval, so that the enhancement intensity is maximized at the peak and approaches zero near the baseline.
[0090] 4) After sharpening and enhancing all protein characteristic peak regions (the new absorbance values corresponding to all wavelengths within the protein characteristic peak regions after the above sharpening and enhancement steps) and comparing them with non-characteristic regions. The wavelengths that remain unchanged within the wavelength range are merged in their original wavelength order to obtain the enhanced near-infrared spectral vector. This indicates that the near-infrared spectrum of the fish lysate sample, enhanced by protein characteristic peak guidance, still has a length of [missing value]. However, it has the characteristics of clearer protein characteristic peak boundaries, more obvious peak differences, and stronger local separability.
[0091] The effective protein information in the near-infrared spectrum of fish slurry is mainly concentrated in a limited range of characteristic peaks. If it is uniformly enhanced across the entire spectrum, it is easy to simultaneously amplify the water zone and the noise zone, which will reduce the stability of subsequent modeling. This invention does not uniformly sharpen the entire spectrum, but only performs sharpening adaptively according to local contrast and peak height within the protein characteristic peak range. This can enhance protein-related differences while suppressing the amplification of irrelevant background.
[0092] S3. Construction of a deep neural network based on dynamic spectral decoupling and adaptive robust regression
[0093] In the near-infrared spectrum of fish slurry samples, the absorption peaks of protein, water, fat and other soluble components overlap significantly. Different production batches and different degrees of hydrolysis can also cause peak position shifts and peak intensity changes. Conventional multilayer perceptrons and fixed convolutional kernel networks usually perform regression directly on the mixed spectrum, which can easily learn covariates that are not related to protein, resulting in a decrease in generalization ability.
[0094] This invention first constructs a dynamic spectral decoupling module to separate the protein contribution and interference contribution in the enhanced near-infrared spectral vector. Then, it performs deep feature extraction and adaptive robust regression on the protein component contribution spectral vector, thereby improving the accuracy and stability of protein content prediction. The specific steps are as follows:
[0095] S301, Dynamic Spectral Decoupling and Protein Characterization
[0096] This invention enhances the near-infrared spectral vector through a component basis vector and dynamic weight generation mechanism. The protein contribution and interference contribution are decomposed into protein contribution and interference contribution, and a purer protein component contribution spectral vector is obtained. The specific steps are as follows:
[0097] 1) Construct a set of component basis vectors, which includes one protein component basis vector. and basis vectors of each interference component Protein component basis vector and interference component basis vectors As learnable parameters, they are iteratively updated using gradient descent, enabling the basis vectors to learn more fundamental and representative component absorption patterns from the data, rather than being limited to initial prior knowledge.
[0098] Protein component basis vector This represents the reference spectrum corresponding to the protein-dominant absorption mode in the fish lysate sample, with a length of [length missing]. ;
[0099] Interference component basis vector Indicates the first Each interference component basis vector has a length of [number]. ;
[0100] This indicates the number of interfering components, which can be 2 to 5, corresponding to moisture, fat, ash, or other major interfering components.
[0101] Protein component basis vector The average near-infrared spectrum of fish lysate samples with high protein content can be used for initialization, or the average near-infrared spectrum of pure protein or high-protein reference samples can be used for initialization; interference component basis vectors Initialization can be performed using samples with high water content, samples with high fat content, or typical interference spectral shapes after clustering; after initialization of all component basis vectors, they are correlated with the enhanced near-infrared spectral vectors in the wavelength dimension. Align them and normalize them to a uniform scale.
[0102] 2) Enhance near-infrared spectral vectors Input component weights to generate branches, output component weight vectors ;
[0103] Component weight vector This represents the contribution ratio of the current fish lysate sample to each component basis vector, with a length of [value missing]. ,in, .
[0104] The component weight generation branch consists of one layer normalization layer, two fully connected layers, and one Softmax normalization layer. Specifically, it first generates the enhanced near-infrared spectral vector. Input the first fully connected layer, and set the length Mapping to 64-dimensional or 128-dimensional latent features, then activating with ReLU or LeakyReLU, and then inputting into a second fully connected layer to map the latent features to... The component response values are then normalized using Softmax to obtain the component weight vector. Based on this, the component weight vector All components are non-negative, and the sum of all components is 1, which is consistent with the physical intuition of weighted combination of components.
[0105] 3) Based on the component weight vector The component combination spectrum of the current fish slurry sample is reconstructed using the set of component basis vectors. Specifically, for the first... The reconstructed spectrum of each training sample can be denoted as: , represented as ;
[0106] in, Indicates the first The component reconstructed spectral vectors of the training samples have a length of [missing information]. ; Indicates the first The training sample at the th ... Weights on each component; Indicates the first Each component basis vector can be either a protein component basis vector or a protein component basis vector. It can also be a basis vector of a certain interfering component. .
[0107] 4) From enhanced near-infrared spectral vectors Further extraction of protein component contributions is performed, specifically, starting with the component weight vector. Extracting protein weights , This indicates the proportion of the current fish lysate sample that is determined to be contributed by protein components; then a gating vector is constructed. , Represents the basis vectors of protein components along the wavelength dimension. The coefficient sequence for local modulation has a length of Each element takes a value between 0 and 1; the gate vector. Enhanced near-infrared spectral vectors The input gating branch is obtained.
[0108] The gated branch consists of two one-dimensional convolutional layers. Specifically, the kernel size of the first one-dimensional convolutional layer can be 5, and the number of output channels can be 16. The kernel size of the second one-dimensional convolutional layer can be 3, and the number of output channels is 1. A non-linear activation function is set between the two one-dimensional convolutional layers. Finally, the output is restricted to between 0 and 1 by the sigmoid function to obtain the gated vector. Used to adjust the protein component basis vectors based on the actual peak shape of the current fish lysate sample. Perform local enhancement or local suppression.
[0109] 5) Weight the protein Protein component basis vectors and gate vector By combining these methods, we obtain the spectral vectors contributing to the protein components. , represented as ;
[0110] in, This represents the spectral vector contributing to the protein component, with a length of [missing information]. This reflects the near-infrared response, which is more directly related to protein content. Represents element-wise multiplication; gated vector Its function is to make protein components contribute to the spectral vector. It is no longer a fixed template, but can adapt to changes in the local peak shape of the current fish lysate sample.
[0111] In fish lysate samples, the absorption peaks of protein, water, and fat highly overlap. Without prior component-level decomposition, deep networks can easily misinterpret interfering components as protein discrimination information. This invention does not directly enhance the near-infrared spectral vector. Instead of inputting a regression network, the near-infrared spectral vector is first enhanced using component basis vectors and a dynamic weight generation mechanism. Perform dynamic decoupling, and then use a gating vector. protein component basis vectors By performing wavelength-level modulation, subsequent regressions can be based as much as possible on the spectral vectors contributed by protein components. This improves the interpretability and robustness of the model.
[0112] S302, Deep Feature Extraction and Prediction Based on Adaptive Robust Regression
[0113] After obtaining the spectral vectors of protein component contributions Then, it is necessary to further extract high-level features related to changes in protein content and output predicted protein content values.
[0114] Considering the batch differences and anomalous samples in fish slurry samples, this invention adds an adaptive correction branch to the basic regression output, enabling the model to perform compensation when features deviate from the mainstream distribution. The specific steps are as follows:
[0115] 1) Contribute spectral vectors from protein components Input a one-dimensional convolutional feature extraction network;
[0116] The one-dimensional convolutional feature extraction network is formed by stacking three convolutional blocks in sequence. Each convolutional block contains one one-dimensional convolutional layer, one batch normalization layer, and one LeakyReLU activation layer. The kernel size is uniformly set to 3 to capture the local fine structure near the protein feature peak in the wavelength dimension.
[0117] The number of output channels for the three convolutional blocks are set to 32, 64 and 128 respectively; the convolution stride is 1 for all blocks, and the padding method is to keep the spectral length basically unchanged; after the first two convolutional blocks, an average pooling layer or a max pooling layer with a length of 2 can be set to expand the receptive field and suppress local noise.
[0118] 2) Input the output feature map of the third convolutional block into the global average pooling layer to obtain the depth feature vector. , representing the high-level protein discrimination features aggregated along all wavelength dimensions, with a length of , This indicates the number of output channels for the third convolutional block, for example, 128.
[0119] 3) Transfer the deep feature vector Input the baseline regression layer to obtain the baseline predicted values of protein content. , represents the initial regression result obtained by the network based on the backbone depth features, expressed as a percentage of quality.
[0120] 4) Transfer the deep feature vector Synchronous input correction branch to obtain adaptive correction term This is used to provide subtle compensation for the baseline regression results. When the characteristics of the current fish molten rock sample deviate from the mainstream patterns in the training set, an adaptive correction term is applied. It is usually a non-zero value.
[0121] The correction branch uses a single linear layer to directly output a scalar correction value; in another implementation, the correction branch uses a two-layer structure of "fully connected layer - LeakyReLU activation - fully connected layer" to improve the ability to model complex batch biases.
[0122] 5) Baseline predicted protein content With adaptive correction term Add them together to get the final predicted protein content. , represented as ;
[0123] Among them, the final predicted protein content value It can simultaneously reflect the results of the main regression and the results of the abnormal offset correction.
[0124] Fish sol samples often exhibit sample inhomogeneity, sampling errors, and batch fluctuations. Relying solely on single-path regression can be overly sensitive to local anomalies or insufficiently adaptable to batch variations. This invention addresses these issues by adding an adaptive correction term. The model can compensate for a small number of off-samples without compromising the stable fitting ability of the backbone, thus improving the overall regression accuracy and cross-batch adaptability.
[0125] S4. Model Training and Optimization Based on Component Consistency Constraints
[0126] Traditional deep regression training usually only focuses on the error between the predicted value and the true value, which easily ignores the physical laws of the additive combination of various components in the near-infrared spectrum of fish lysate samples, and also makes it difficult to take into account samples with low protein content and abnormal samples.
[0127] In addition to regression loss, this invention introduces component reconstruction constraints and component weight concentration constraints to make the results learned by the dynamic spectral decoupling module more physically reasonable. Furthermore, it employs robust loss and low-value sample weighting mechanisms to improve full-range prediction performance. The specific steps are as follows:
[0128] S401, Definition of Component Consistency Constraints
[0129] This invention constrains the dynamic spectral decoupling module from the perspective of spectral reconstruction, enabling the component weight vector and component basis vector to not only serve the prediction task but also interpret the enhanced near-infrared spectral vector itself. The specific steps are as follows:
[0130] 1) For each training sample, the first... Enhanced near-infrared spectral vectors corresponding to each training sample Input the dynamic spectral decoupling module to obtain the component weight vector corresponding to the training sample. Component reconstructed spectral vector and protein component contribution spectral vector ;
[0131] in, Indicates the first The enhanced near-infrared spectral vector corresponding to each training sample; Indicates the first The component weight vector of each training sample; Indicates the first The component reconstructed spectral vector is obtained by reconstructing the component basis vector set from each training sample; Indicates the first The spectral vectors of protein components contributing to each training sample.
[0132] 2) Calculate the spectral reconstruction error loss Used to measure the reconstructed spectral vector of components With enhanced near-infrared spectral vector The smaller the difference between them, the better the component weight vector. The stronger the explanatory power of the component basis vector set on the original spectrum, the better.
[0133] For each training sample, the component reconstructed spectral vector is first calculated wavelength by wavelength. With enhanced near-infrared spectral vector The squared difference is then averaged over all wavelengths and all training samples to obtain the spectral reconstruction error loss. .
[0134] 3) Calculate the concentrated constraint loss of component weights Due to the component weight vector After Softmax normalization, the component weight vector is directly processed. use Norms cannot effectively reflect the constraint requirement of "a few components dominating". Therefore, this invention uses information entropy constraints to achieve component weight concentration, and the component weight concentration constraint loss... It can be represented as When the component weight vector When the focus is more concentrated on a few components, the information entropy is lower, and the component weight concentration constraint loss is greater. It's also smaller;
[0135] in, This represents the concentrated constraint loss of component weights; Indicates the total number of training samples; Indicates the first The training sample at the th ... Weights on each component; This represents a very small constant to prevent numerical anomalies in logarithmic operations, and can be taken as... .
[0136] Since the number of major components such as protein, water and fat is limited, a reasonable decoupling result should be able to interpret the input spectrum without dispersing the weight on too many spurious components. This invention introduces the components of the near-infrared spectrum of fish slurry samples into the training process with priors, instead of relying solely on automatic fitting by a black-box network. This enables the component representation learned by the model to be more in line with physical laws and has a stronger generalization ability.
[0137] S402. Definition of Robust Loss Function and Overall Optimization Objective
[0138] To reduce the impact of outliers and improve the fitting accuracy of low-protein content sample intervals, this invention employs robust regression loss in the prediction layer and assigns higher weights to low-value samples. The specific steps are as follows:
[0139] 1) For each training sample, obtain the true protein content value and the final predicted protein content value, calculate the residual, and define... Indicates the first The true protein content value of a training sample, defined as The model represents the first The final predicted protein content output of each training sample is used to measure the prediction error of that training sample in the current round.
[0140] 2) Use Huber loss as the basis for predicting loss. The single-sample calculation rule, specifically, is that when the absolute value of the residual is less than or equal to a threshold... When the residual absolute value is greater than a threshold, the error is calculated based on the squared error to ensure smooth convergence on most normal samples; when the residual absolute value is greater than a threshold... At that time, calculations are performed based on linear error to reduce the impact of outliers on the gradient; threshold The error level can be set according to the protein content labeling level in the training set, for example, it can be 0.5, 1 or 2 mass percentage units.
[0141] 3) To increase the learning priority of samples with low protein content, a sample weight is assigned to each training sample. The weight is calculated as follows: When the actual protein content value When the value is lower, the exponent term is larger, and the sample weights are lower. Higher protein content results in more attention being paid to samples with low protein content during training;
[0142] in, Indicates the first The sample weights of each training sample; This represents the enhancement coefficient for low-value samples, and can range from 0.5 to 2. The scale parameter representing the rate of decay can be set according to the protein content distribution of the training set, for example, 0.2 to 0.5 times the average protein content of the training set.
[0143] 4) Weighted base predicted loss Spectral reconstruction error loss and component weights concentrated constraint loss Combined into a total loss function , represented as ;
[0144] in, Represents the total loss function; Indicates spectral reconstruction error loss The balance coefficient; Represents the concentrated constraint loss of component weights The balance coefficient; and This can be determined through parameter tuning of the verification set, for example. It can be 0.01 to 1. The value can be between 0.001 and 0.1.
[0145] 5) Use gradient descent to iteratively update all learnable parameters until the total loss function is reached. It tends to stabilize on the validation set. Learnable parameters include the network parameters in the component weight generation branch, the network parameters in the gating branch, and the protein component basis vectors. All interference component basis vectors One-dimensional convolutional feature extraction network parameters, basic regression layer parameters, and correction branch parameters.
[0146] In one implementation, the optimizer can be Adam, and the initial learning rate can be set to... arrive The batch size can be 16 to 128, and it can be combined with learning rate decay and early stopping strategies.
[0147] It should be noted that in the real production environment of fish slurry samples, there are problems such as the scarcity of low-value samples, abnormal sampling, and batch fluctuations. If only a single regression error is relied upon for optimization, it is easy to have insufficient fitting in certain intervals or unstable models. This invention unifies "robust prediction", "low-value area enhancement" and "component consistency constraint" into the same training objective, which can simultaneously improve the physical rationality of the model, its ability to resist outliers, and its full-range prediction accuracy.
[0148] S5. Application of fish lysate protein content detection
[0149] After completing the above model training and optimization, the constructed fish lysate protein content detection system based on near-infrared spectroscopy can be applied to actual detection tasks.
[0150] For the fish slurry sample to be tested, the original near-infrared spectral vector is first acquired under the same or equivalent spectral acquisition conditions as in the data acquisition phase, following the same operating procedures. This original near-infrared spectral vector is then input into the pre-trained detection system. The system automatically performs defective pixel removal, saturation point removal, and instrument noise smoothing to obtain a pre-processed spectral vector. Then, based on pre-defined prior knowledge of protein characteristic peaks, the system performs partitioned scattering correction and peak sharpening enhancement on the pre-processed spectral vector. Specifically, according to the protein characteristic peak interval and non-characteristic interval division determined in the training phase, nonlinear scattering correction and peak sharpening enhancement are applied within the protein characteristic peak interval, while robust scattering correction is applied within the non-characteristic interval, thereby generating an enhanced near-infrared spectral vector. This enhanced near-infrared spectral vector is then input into the dynamic spectral decoupling module. Using the protein component basis vectors and interference component basis vectors learned in the training phase, as well as the component weight vectors and gate vectors calculated from the component weight generation branch and gate branch, the protein component contribution spectral vector is extracted from the enhanced near-infrared spectral vector. The protein component contribution spectral vector is then fed into a one-dimensional convolutional feature extraction network to extract the depth feature vector. Simultaneously, the basic predicted value of protein content and the adaptive correction term are obtained through the basic regression layer and the correction branch, respectively. The two are added together to output the final predicted value of protein content, which is the protein content detection result of the fish lysate sample to be tested, in units of mass percentage.
[0151] Scatter plots were used to compare the prediction performance of this invention with conventional methods at the sample level, demonstrating the consistency between the predicted and actual values of this invention, as well as its advantages in reducing bias and dispersion compared to conventional methods. The conventional method used in this comparison is partial least squares regression, one of the most widely used traditional modeling methods in near-infrared spectroscopy analysis. Its configuration is consistent with the second group of experiments, using the full spectral range and eight latent variables determined through cross-validation. In the figure, the horizontal axis represents the actual protein content in mass percentage, and the vertical axis represents the predicted protein content in mass percentage. Blue dots represent the prediction results of this invention, orange dots represent the prediction results of partial least squares regression, and the black dashed line is the ideal prediction line, i.e., the predicted value equals the actual value. From the scatter plot distribution, the blue dots of this invention are closely distributed around both sides of the diagonal, with almost no significant deviation, and the coefficient of determination is as high as 0.964, indicating that the model maintains good predictive consistency across different protein content ranges. The orange dots in the partial least squares regression show significant dispersion, especially in the low protein content region, between about 10% and 20%, where there is a clear positive bias. In the high protein content region, above about 30%, there is also some underestimation. The coefficient of determination is only 0.823, which verifies the robustness of the present invention in dealing with sample diversity, batch differences and nonlinear relationships, and can achieve accurate prediction across the entire range.
[0152] To visualize the dynamic spectral decoupling effect, a typical sample with a medium protein content of approximately 12.5% was selected for the experiment. Four spectral curves were displayed: the original spectrum, containing scattering interference and overlapping of multiple components; the enhanced spectrum, which is the enhanced spectrum after the preprocessing steps of this invention; the component reconstruction spectrum, which is obtained by dynamically weighting the learned protein basis vectors and interference basis vectors; and the protein component contribution spectrum, which is the weighted result of the protein basis vectors after gating vector modulation. The horizontal axis in the figure represents wavelength in nanometers, and the vertical axis represents absorbance. As can be seen from the curve shape, the original spectrum has obvious baseline drift and broad peak overlap. The protein characteristic peaks at approximately 2050 nm and 2180 nm are interfered with by water and lipid peaks. The baseline of the enhanced spectrum after preprocessing tends to be stable, but the signal still contains multiple components. The overall trend of the component reconstruction spectrum is highly consistent with that of the enhanced spectrum, indicating that the dynamic decoupling module can accurately reconstruct the original spectral information using the component basis vectors. Experimental results show that the present invention can effectively separate protein information from interference information through component weight generation branching and gating modulation mechanism, so that regression modeling can be based on pure signals that are more directly related to protein content, significantly improving the interpretability and predictive reliability of the model.
[0153] Finally, it should be noted that the above content is only used to illustrate the technical solution of the present invention, and is not intended to limit the scope of protection of the present invention. Simple modifications or equivalent substitutions made by those skilled in the art to the technical solution of the present invention do not depart from the essence and scope of the technical solution of the present invention.
Claims
1. A method for detecting fish lysate protein content based on near-infrared spectroscopy, characterized in that, The method includes: Near-infrared spectral data of fish lysate samples were obtained, and a sample dataset was constructed based on the near-infrared spectral data and the corresponding protein content labels. Based on prior information about protein characteristic peaks, the near-infrared spectral data is subjected to partitioned scattering correction and characteristic peak enhancement processing to obtain an enhanced near-infrared spectral vector. The enhanced near-infrared spectral vector is input into a pre-constructed dynamic spectral decoupling model. Based on the component basis vector and component weight generation mechanism, the enhanced near-infrared spectral vector is decomposed into components to obtain the protein component contribution spectral vector. The protein component contribution spectral vector is input into the deep feature extraction and adaptive robust regression model to output the predicted protein content of the fish lysate sample. Based on component consistency constraints and robust loss functions, the dynamic spectral decoupling model and the adaptive robust regression model are jointly trained and optimized to obtain a fish lysate protein content detection model.
2. The method for detecting fish lysate protein content based on near-infrared spectroscopy according to claim 1, characterized in that, Based on the wavelength range of protein absorption peaks, the near-infrared spectrum is divided into protein characteristic peak ranges and non-characteristic ranges; Nonlinear scattering correction is performed within the protein characteristic peak range, and local scattering correction and local baseline subtraction are constructed based on local neighborhood differences; Robust scattering correction is performed within the non-characteristic region to remove multiplicative scattering and additive baseline drift; An adaptive sharpening enhancement process based on local contrast and peak height is performed on the protein characteristic peak regions to obtain an enhanced near-infrared spectral vector.
3. The method for detecting fish lysate protein content based on near-infrared spectroscopy according to claim 1, characterized in that, The dynamic spectral decoupling model includes: a component basis vector set construction module, used to construct protein component basis vectors and multiple interfering component basis vectors; The component weight generation module is used to generate component weight vectors based on enhanced near-infrared spectral vectors. The spectral reconstruction module is used to reconstruct the component combination spectrum based on the component basis vector set and component weight vector; The protein component extraction module is used to combine protein component weights and gating vectors to modulate the protein component basis vectors and generate protein component contribution spectral vectors.
4. The method for detecting fish lysate protein content based on near-infrared spectroscopy according to claim 1, characterized in that, The component weight generation module includes: The enhanced near-infrared spectral vector is feature-mapped to generate a hidden feature representation; The latent feature representation is mapped to component response values, and then normalized using the Softmax function to obtain the contribution weight of each component. The component weights satisfy the constraint that they are non-negative and sum to 1.
5. The method for detecting fish lysate protein content based on near-infrared spectroscopy according to claim 1, characterized in that, The protein component extraction module includes: Based on the enhanced near-infrared spectral vector, a gated branch is constructed to generate a gated vector along the wavelength dimension; The gating vector and the protein component basis vector are modulated element-wise, and combined with the protein component weights to generate the protein component contribution spectral vector.
6. The method for detecting fish lysate protein content based on near-infrared spectroscopy according to claim 1, characterized in that, The deep feature extraction and adaptive robust regression model includes: A one-dimensional convolutional feature extraction network is used to extract deep features of the spectral vectors contributed by protein components; The basic regression branch is used to output the basic predicted value of protein content; An adaptive correction branch is used to generate the predicted correction term; The base prediction value is added to the correction term to obtain the final protein content prediction value.
7. The method for detecting fish lysate protein content based on near-infrared spectroscopy according to claim 1, characterized in that, The joint training optimization includes: A spectral reconstruction error loss is constructed to constrain the difference between the component reconstructed spectrum and the enhanced near-infrared spectrum; Construct a component weight constraint loss to constrain the sparsity or concentration of the component weight distribution; Construct a robust regression loss function to reduce the impact of outliers on model training; A weighting mechanism is introduced for samples with low protein content to improve the model's prediction accuracy in the low content range.
8. A fish lysate protein content detection system based on near-infrared spectroscopy, characterized in that: include: The data acquisition module is used to acquire near-infrared spectral data of fish lysate samples; The spectral processing module is used to perform partitioned scattering correction and characteristic peak enhancement processing on the near-infrared spectral data; The spectral decoupling module is used to generate protein component contribution spectral vectors based on a dynamic spectral decoupling model. The prediction module is used to output predicted protein content values based on deep feature extraction and an adaptive robust regression model. The model training module is used to train and optimize the model based on component consistency constraints.
9. An electronic device, characterized in that, The device includes a processor and a memory, wherein the memory stores a computer program, which, when executed by the processor, causes the processor to perform the fish lysate protein content detection method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method for detecting fish lysate protein content according to any one of claims 1 to 7.