Double-integration partial least square modeling method based on Monte Carlo and LASSO

A technique of least squares and modeling methods, applied in the field of analytical chemistry, can solve the problem of low accuracy of modeling and prediction, and achieve the effect of improving prediction ability and prediction accuracy

Active Publication Date: 2017-03-22
TIANJIN POLYTECHNIC UNIV +1
View PDF8 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method not only retains the advantages of the two methods, but also overcomes t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Double-integration partial least square modeling method based on Monte Carlo and LASSO
  • Double-integration partial least square modeling method based on Monte Carlo and LASSO
  • Double-integration partial least square modeling method based on Monte Carlo and LASSO

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0040] Example 1:

[0041] This embodiment is applied to near-infrared spectroscopy data analysis to determine the oil content in corn samples. The specific steps are as follows:

[0042] 1) Collect 80 corn samples, use three different near-infrared spectrometers (M5, MP5, MP6) to measure the near-infrared spectroscopy data of corn, and use the oil content as the target value. The wavelength range of the near infrared spectrum is 2498~1100nm (4003~9091cm -1 ), the sampling interval is 2nm, a total of 700 wavelength data points. The data is downloaded from http: / / software .eigenvector.com / Data / Corn / index. html. Using the KS grouping method, 53 samples were used as the training set, and the remaining 27 samples were used as the prediction set. The near-infrared spectrum of the training set of this data is as figure 2 Shown.

[0043] 2) Determine the factor LV of the PLS model

[0044] Calculate the cross-validation root mean square error (RMSECV) under different factors, and th...

Example Embodiment

[0054] Example 2:

[0055] This embodiment is applied to the analysis of ultraviolet spectrum data to determine the content of monocyclic aromatic hydrocarbons in gasoline samples. The specific steps are as follows:

[0056] 1) Collect 115 light gasoline and diesel fuel samples, the ultraviolet spectrum wavelength range is 200-400nm, the sampling interval is about 0.35nm, a total of 572 wavelength data points. The content of monocyclic aromatic hydrocarbons was determined by HP model G1205A supercritical fluid chromatography (Hewlett-Packard, Palo Alto, Calif). The data is downloaded from http: / / myweb.dal.ca / pdwentze / downloads.html. The training set and prediction set are divided according to the instructions on the Internet. The first 70 samples are used as the training set, and the last 44 samples are used as the prediction set. The training set UV spectrum of this data is as Image 6 Shown.

[0057] 2) Determine the factor LV of the PLS model

[0058] Calculate the cross-valida...

Example Embodiment

[0068] Example 3:

[0069] This embodiment is applied to near-infrared spectroscopy data analysis to determine the content of sesame oil in a quaternary blend oil sample. The specific steps are as follows:

[0070] 1) Collect 51 quaternary blended oil samples containing sesame oil, corn oil, soybean oil and rice oil. Use Vertex70 multi-band infrared / near-infrared spectrometer (Bruker, Germany) for near-infrared spectroscopy data measurement, the wave number range is 4000~12000cm -1 , The sampling interval is 1.93cm -1 , A total of 4148 data points. Set the sesame oil content as the target value. Using the KS grouping method, 34 samples were used as the training set, and the remaining 17 samples were used as the prediction set. The near-infrared spectrum of the training set of this data is as Picture 10 Shown.

[0071] 2) Determine the factor LV of the PLS model

[0072] Calculate the cross-validation root mean square error (RMSECV) under different factors, and the factor correspo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of analytical chemistry, and in particular relates to a double-integration partial least square modeling method based on Monte Carlo and LASSO. The double-integration partial least square modeling method disclosed by the invention comprises the following steps of: firstly, selecting a certain number of samples as a sample sub-set by adopting a Monte Carlo technology, then, selecting a part of variables as a sample variable sub-set from the sample sub-set by adopting a LASSO technology, repeating for many times, establishing multiple sub-models, and directly averaging prediction results of the models so as to obtain a final prediction result. By means of the method, the prediction capability of the models can be effectively improved; the prediction precision of the models can be increased; and the double-integration partial least square modeling method has the obvious advantages in the aspects of the prediction precision and the stability. The double-integration partial least square modeling method disclosed by the invention is suitable for quantitative analysis of complex samples, such as petroleum, tobacco, foods and traditional Chinese medicines.

Description

technical field [0001] The invention belongs to the technical field of analytical chemistry, and in particular relates to a double-integrated partial least squares modeling method based on Monte Carlo and LASSO. Background technique [0002] Spectral analysis technology has been widely used in agriculture, food, medicine, environment and other fields due to its advantages of simplicity, speed, greenness and non-destructiveness. However, due to the serious overlapping of spectral absorption peaks, weak signal absorption, and serious background interference, chemometrics methods are required for qualitative and quantitative analysis of complex samples. Establishing a model with good stability and high prediction accuracy has always been the key to the quantitative analysis of complex samples. [0003] The traditional modeling method uses a single model to establish a quantitative analysis model between the spectrum and the target value to be measured, and the prediction effec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/50G01N21/31
CPCG01N21/31G06F30/20
Inventor 卞希慧张彩霞徐杨谭小耀陈宗蓬王晨
Owner TIANJIN POLYTECHNIC UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products