Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure

A technology of quantitative structure and model structure, applied in special data processing applications, instruments, electrical and digital data processing, etc., can solve problems such as model stability and predictive ability that cannot be explained, and achieve the effect of avoiding overfitting.

Inactive Publication Date: 2012-10-17
NANJING UNIV
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it has been found that q2 and RMSEV have many problems, such as Golbraikh et al. pointed out that q2 It is only a necessary condition for whether the model has the ability to predict, but not a sufficient condition (Golbraikh A., Tropsha A. Beware of q2! J. Mol. Graph. Mod. 2002, 20 (4), 269-276.); Hawkins clearly pointed out that q2 Improper use can lead to overfitting; in fact, many subsets of variables have high q2 values, but the correlation coefficient of the model itself r 2The value is very low, even close to 0, that is, using q2 or RMSEV alone cannot Explain the stability and predictive power of the model (Hawkins D. The problem of overfitting. J. Chem. Inf. Comput. Sci. 2004, 44 (1), 1-12.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure
  • Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure
  • Method for screening and terminating structure descriptor of activity related model of pollutant quantitative structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention is further illustrated by the following examples.

[0026] Literature (Selwood D. L., Livingstone D. J., Comley J. C. W., O'Dowd A. B., Hudson A. T., Jackson P., Jandu K. S., Rose V. S., Stables, J. N. Structure-Activity Relationships of Antifilarial Antimycin Analogues: A Multivariate Pattern Recognition Study. J. Med. Chem. 1990, 33 (1), 136-142.) gives 31 types of 53 structural descriptors, which are called the Selwood data set in the field of QSAR modeling method research, and can be used as a "standard" test set for structural descriptor screening. Liu Shushen et al. Propose a variable selection and modeling method based on the prediction (VSMP) pair (Liu S. S., Liu H. L., Yin C. S., Wang L. S. VSMP: A Novel Variable Selection and Modeling Method Based on the Prediction. J. Chem. Inf. Comput. Sci. 2003, 43 , 964-969.) Modeling research on Selwood data, obtained by the structure descriptor x 13 , x 14 , x 38 , x 50 with x 52 T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for screening and terminating a structure descriptor of an activity related model of a pollutant quantitative structure. The method provided by the invention comprises the following steps of: integrating a cross validation correlation coefficient q2 and a model modification correlation coefficient R2adj, establishing a statistical model of a variable subset to obtain a correlation coefficient r2 between an observed value and a model estimation value and obtain a modification correlation coefficient R2adj; subjecting the variable subset of the process above to cross validation to obtain a cross validation correlation coefficient q2 of the model, wherein the cross validation is carried out by means of two methods, i.e. a leave-one-out cross validation and a leave-many-out cross validation; constructing a new parameter QRadj according to a statistical parameter obtained in the process above, wherein the numerical value of the new parameter QRadj of the same system is proportional to the stability of the model and is proportional to the predictive ability. The method for screening and terminating a structure descriptor of an activity related model of apollutant quantitative structure provided by the invention has the advantages that the relatively high cross validation correlation coefficient q2 of the model can be ensured while avoiding the presence of over-fitting phenomenon through the new standard QRadj, the QSAR (Quantitative Structure Activity Relationship) model variable combination with low r2 value and high q2 value can be prevented from screening, and the stability and the predictive ability of the model are scientifically described.

Description

technical field [0001] The invention relates to a method for screening and terminating structure descriptors of quantitative structure-activity correlation models of pollutants, that is, when establishing a quantitative structure-activity correlation model, a cross-validation method is used to verify internal samples of the model, and a cross-validation correlation coefficient is constructed q 2 Correlation coefficient with model correction R 2 adj the product of QR adj As the termination characterization of model structure descriptor screening, it is used to describe the stability and predictive ability of the model, and judge the pros and cons of the predictive performance of the model. Background technique [0002] The Quantitative Structure and Activity Relationship (QSAR) model of pollutants has been widely used in environmental ecological risk assessment and human health risk assessment of pollutants (Wang Liansheng, Han Shuokui. Quantitative Structure-Activity of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/50
Inventor 张爱茜易忠胜穆云松蔺远高常安李富华
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products