Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure

A technology of quantitative structure and model structure, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems that cannot explain model stability and predictive ability, and achieve the effect of avoiding over-fitting phenomenon

Inactive Publication Date: 2011-08-17
NANJING UNIV
View PDF1 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But it has been found q 2 and RMSEV There are many problems, as pointed out in Golbraikh et al. q 2 It is only a necessary condition for whether the model has the ability to predict, but not a sufficient condition (Golbraikh A., Tropsha A. Beware of q 2 ! J. Mol. Graph. Mod. 2002, 20 (4), 269-276.); Hawkins clearly pointed out that the q 2 Improper use of can lead to overfitting phenomenon; in fact, many subsets of variables have high q 2 value, but the correlation coefficient of the model itself r 2 The value is very low, even close to 0, that is, used alone q 2 or RMSEV It does not explain the stability and predictive ability of the model (Hawkins D. The problem of overfitting. J. Chem. Inf. Comput. Sci. 2004, 44 (1), 1-12.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure
  • Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure
  • Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention is further illustrated by the following examples.

[0026] Literature (Selwood D. L., Livingstone D. J., Comley J. C. W., O'Dowd A. B., Hudson A. T., Jackson P., Jandu K. S., Rose V. S., Stables, J. N. Structure-Activity Relationships of Antifilarial Antimycin Analogues: A Multivariate Pattern Recognition Study. J. Med. Chem. 1990, 33 (1), 136-142.) gives 31 types of 53 structural descriptors, which are called the Selwood data set in the field of QSAR modeling method research, and can be used as a "standard" test set for structural descriptor screening. Liu Shushen et al. Propose a variable selection and modeling method based on the prediction (VSMP) pair (Liu S. S., Liu H. L., Yin C. S., Wang L. S. VSMP: A Novel Variable Selection and Modeling Method Based on the Prediction. J. Chem. Inf. Comput. Sci. 2003, 43 , 964-969.) Modeling research on Selwood data, obtained by the structure descriptor x 13 , x 14 , x 38 , x 50 and x 52 The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for screening and terminating a structure descriptor of an activity related model of a pollutant quantitative structure. The method provided by the invention comprises the following steps of: integrating a cross validation correlation coefficient q2 and a model modification correlation coefficient R2adj, establishing a statistical model of a variable subset to obtain a correlation coefficient r2 between an observed value and a model estimation value and obtain a modification correlation coefficient R2adj; subjecting the variable subset of the process above to cross validation to obtain a cross validation correlation coefficient q2 of the model, wherein the cross validation is carried out by means of two methods, i.e. a leave-one-out cross validation and a leave-many-out cross validation; constructing a new parameter QRadj according to a statistical parameter obtained in the process above, wherein the numerical value of the new parameter QRadj of the same system is proportional to the stability of the model and is proportional to the predictive ability. The method for screening and terminating a structure descriptor of an activity related model of a pollutant quantitative structure provided by the invention has the advantages that the relatively high cross validation correlation coefficient q2 of the model can be ensured while avoiding the presence of over-fitting phenomenon through the new standard QRadj, the QSAR (Quantitative Structure Activity Relationship) model variable combination with low r2 value and high q2 value can be prevented from screening, and the stability and the predictive ability of the model are scientifically described.

Description

technical field [0001] The invention relates to a method for screening and terminating structure descriptors of quantitative structure-activity correlation models of pollutants, that is, when establishing a quantitative structure-activity correlation model, a cross-validation method is used to verify internal samples of the model, and a cross-validation correlation coefficient is constructed q 2 Correlation coefficient with model correction R 2 adj the product of QR adj As the termination characterization of model structure descriptor screening, it is used to describe the stability and predictive ability of the model, and judge the pros and cons of the predictive performance of the model. Background technique [0002] The Quantitative Structure and Activity Relationship (QSAR) model of pollutants has been widely used in environmental ecological risk assessment and human health risk assessment of pollutants (Wang Liansheng, Han Shuokui. Quantitative Structure-Activity of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/50
Inventor 张爱茜易忠胜穆云松蔺远高常安李富华
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products