Multivariable correction characteristic wavelength selection method based on minimum correlation coefficient

A technology of correlation coefficient and characteristic wavelength, which is applied in the field of near-infrared spectrum wavelength selection, can solve problems such as difficult to understand principles, variable collinearity, and complex operation, and achieve the effects of improving robustness and prediction accuracy, efficient dimensionality reduction, and cost reduction

Active Publication Date: 2019-11-26
HEILONGJIANG BAYI AGRICULTURAL UNIVERSITY
View PDF8 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the successive projections algorithm (successive projections algorithm, SPA) is a wavelength selection algorithm that minimizes collinearity among variables through vector projection analysis, but its principle is not easy to understand, and its operation is more complicated. The rest of the algorithms also have variables. The problem of collinearity among them, people are also doing related research

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multivariable correction characteristic wavelength selection method based on minimum correlation coefficient
  • Multivariable correction characteristic wavelength selection method based on minimum correlation coefficient
  • Multivariable correction characteristic wavelength selection method based on minimum correlation coefficient

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] The near-infrared spectroscopy data of soil samples are public and come from the website Quality&Technology. The sample data contains two parts, the NIR spectrum and chemical properties of the sample. There are 108 samples in total. The wavelength range of the sample spectrum is 400-2500nm, the sampling interval is 2nm, and there are 1050 wavelength points in total. The near-infrared spectrum diagram is as follows figure 2 shown. The present invention uses soil organic matter content as a dependent variable to carry out wavelength selection and near-infrared spectrum data modeling prediction analysis to prove the effectiveness of the method.

[0039] Step 1: Divide 108 samples into 75% modeling set and 25% validation set, the modeling set contains 81 samples, and the validation set contains 27 samples. In order to correct the spectral baseline, eliminate the interference of other backgrounds, and improve the spectral resolution, the original spectral data is preproc...

Embodiment 2

[0057] A set of near-infrared spectral data of publicly available grains from the website EigenVector. The data set includes 80 grain samples measured by three different near-infrared spectrometers. The wavelength range of the sample spectrum is 1100-2498nm, and the sampling interval is 2nm, with a total of 700 wavelength points. Chemical properties include moisture, oil, protein and starch values. In this example, the near-infrared spectrum measured by the instrument mp6 is selected, and the starch content in the grain is used as the dependent variable to carry out wavelength selection, spectral data modeling, and predictive analysis to illustrate the effectiveness of this method.

[0058] Step 1: Divide 80 samples into 75% training set and 25% validation set, the modeling set contains 60 samples, and the validation set contains 20 samples. In order to correct the spectral baseline, eliminate the interference of other backgrounds, and improve the spectral resolution, the ori...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multivariable correction characteristic wavelength selection method based on a minimum correlation coefficient, and aims to solve the problem of an existing wavelength selection method. The method comprises the following steps: performing S-G first-order derivative processing on the spectral data set X; calculating absolute values of correlation coefficients among the column vectors; obtaining correlation coefficient matrix R, calculating an average value and a standard deviation of other elements except diagonal lines in each column in the correlation coefficient matrix R; selecting a correlation coefficient average value and a standard deviation threshold value pair; forming a to-be-selected wavelength set S; sorting the wavelengths of the S set to obtain a setS'; gradually adding a wavelength variable to establish an MLR model; calculating the RMSEV value of each model, taking the variable subset corresponding to the minimum RMSEV value as the characteristic wavelength under S, selecting the next threshold pair, repeating the above steps, and finding the corresponding minimum RMSEV value and the corresponding characteristic wavelength under all characteristic wavelength sets. According to the variable selection method, redundancy is reduced to the maximum extent, the principle is simple, and implementation is easy.

Description

technical field [0001] The invention relates to the field of near-infrared spectrum wavelength selection, in particular to a multivariate correction characteristic wavelength selection method based on the minimum correlation coefficient. Background technique [0002] In recent years, near-infrared spectroscopy has been widely used in petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical fields. The near-infrared spectral region (800-2500nm) is mainly composed of double frequency and combined frequency absorption peaks of hydrogen-containing groups. The absorption intensity is weak and the sensitivity is low. There will be deficiencies such as multicollinearity or too many non-informative variables. Selecting the characteristic wavelengths for the full spectrum is to reduce data redundancy and multicollinearity, which can improve the prediction accuracy of the model and simplify the complexity of the model. [0003] Common variable select...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G01N21/359
CPCG01N21/359G06F18/213
Inventor 陈争光
Owner HEILONGJIANG BAYI AGRICULTURAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products