Calibration set and validation set selection method based on spectral similarity and modeling method

A technology of spectral similarity and modeling method, which is applied in the field of calibration set and verification set selection and modeling based on spectral similarity, can solve the problem of whether unknown samples have a good prediction and is difficult to determine, and achieve strong prediction ability, good modeling performance

Active Publication Date: 2020-01-14
SHANDONG UNIV
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But whether these two methods have good predictions for unknown samples is difficult to determine

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Calibration set and validation set selection method based on spectral similarity and modeling method
  • Calibration set and validation set selection method based on spectral similarity and modeling method
  • Calibration set and validation set selection method based on spectral similarity and modeling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0082] See Table 1 for the results of building models for the four components of corn data in this example. where Lv is the number of latent factors, N c is the number of samples in the calibration set, N v is the number of samples in the validation set.

[0083] Table 1 List of prediction results of each component of corn

[0084]

[0085] It can be seen from Table 1 that the smaller the values ​​of RMSEC, RMSEV and RMSEP, the better, and R c , R v and R p The bigger the better. Each component of corn has a good modeling effect, and the correction set correlation coefficient R c All reached above 0.95, indicating that the model has good performance and a good fitting effect, and only about 40 samples were selected as the calibration set. Validation set correlation coefficient R v Both reached above 0.95, indicating that the model has a good predictive ability for the verification set samples, and for the randomly selected independent test set, except for oil, the r...

Embodiment 2

[0098] Taking Salvia miltiorrhiza as an example, a total of 120 samples were tested, including the repetition of samples. X is the near-infrared spectrum matrix of the sample, measured by a Fourier transform near-infrared spectrometer (AntarisⅡ, Thermo Fisher, USA), and Y is the matrix of four quality indicators, namely tanshinone ⅡA (TSⅡA), cryptotanshinone (CTS), tanshinone Ⅰ (TS Ⅰ), salvianolic acid B (SAB), the original spectrum of the sample can be found in Figure 6 . Each component is the detection object, and the new classification method is evaluated. In the following description, tanshinone IIA is taken as an example, and the same steps are taken for other components. First remove the abnormal samples, through Hotelling T 2 method, 3 abnormal samples were detected, and 117 samples remained after elimination. The principal component analysis diagram after removing the abnormal values ​​is shown in Figure 7 . Randomly select 15 samples as an independent test set X...

Embodiment 3

[0116] Taking the public data corn as an example, there are 80 samples tested. X is the near-infrared spectrum matrix of the sample, and Y is the matrix of four component quality indicators. Take water as the object description, and take the same steps for the rest of the ingredients, first remove the abnormal samples, and use Hotelling T 2 method, 3 abnormal samples are detected, and then a total of 77 samples are left after elimination, and 10 samples are randomly selected as an independent test set X t .

[0117] To divide the remaining 67 samples, we changed the number of validation set samples to investigate the impact of various division methods on the performance of the model after changing the number of validation set samples. Among them, for each independent test set sample, select 2 (that is, g=2) samples with the closest Euclidean distance to be included in the verification set. The number of samples in the verification set is between 14 and 20, and the remaining ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a calibration set and validation set selection method based on spectral similarity and a modeling method. The calibration set and validation set selection method comprises thefollowing steps of performing near infrared spectrum determination on an original sample to obtain an original sample spectrum matrix; randomly extracting m samples as an independent testing set; foreach sample in the independent testing set, respectively computing the spectral similarity between the sample and each remaining sample in the original sample in order to obtain g samples with the highest similarity to be written into the validation set; and for each sample in the independent validation set, respectively computing the spectral similarity between the sample and each remaining sample in the original sample in order to obtain n samples with the highest similarity to be written into the calibration set. According to the validation set and the calibration set which are selected based on the method of the invention, an obtained model can more accurately predict an unknown model.

Description

technical field [0001] The invention belongs to the technical field of unknown item prediction, and in particular relates to a method for selecting and modeling a correction set and a verification set based on spectral similarity. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] Near-infrared spectroscopy (NIR) is a non-destructive, non-polluting, and reproducible analytical technique that is developing rapidly. With the development of chemometrics and computer technology, this technique has been used in agricultural products, petrochemicals, pharmaceuticals, Widely used in environmental, process control, clinical and biomedical fields. A major feature of this method is that it needs to use chemometrics to correlate the spectral information of the sample with the corresponding reference value information (such as content, source, etc.) to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G01N21/359G06F17/18
CPCG01N21/359G06F17/18G01N21/274G01N2201/129G06F30/20G01N2201/127G06F17/16
Inventor 聂磊孙越臧恒昌曾英姿刘肖雁苏美袁萌王林林姜红楚广诣
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products