Variable selection method for modeling organic pollutant quantitative structure and activity relationship

A technology of organic pollutants and quantitative structure, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of inability to screen large-scale variable sets, variable screening methods that cannot be verified to be optimal, and cannot guarantee the same results, etc. question

Inactive Publication Date: 2012-09-19
GUILIN UNIVERSITY OF TECHNOLOGY
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] At present, there are two basic problems in the variable screening methods commonly used in QSAR research. One is that the variable screening method of the full regression type cannot effectively and quickly screen large-scale variable sets, and the other is that the random variable screening method cannot verify whether Optimal and cannot guarantee that different screening processes will get the same results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Variable selection method for modeling organic pollutant quantitative structure and activity relationship
  • Variable selection method for modeling organic pollutant quantitative structure and activity relationship
  • Variable selection method for modeling organic pollutant quantitative structure and activity relationship

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] The so-called "standard" test set - the Selwood data set - was selected for testing. This dataset was first published in the literature (Selwood, D.L.; Livingstone, D.J.; Comley, J.C.W.; O'Dowd, A.B.; Hudson, A.T.; Jackson, P.; Jandu, K.S.; Rose, V.S.; Stables, J.N., Structure-activity Relationships of antifilarial antimycin analogs: a multivariate pattern recognition study. J. Med. Chem. 1990, 33(1), 136-142.). The dataset contains 31 samples and 53 descriptors. The parameters set during the screening process are as follows: the number of retained models Ns = 100, the correlation coefficient between variables r int =0.9, the initial value r of the critical value of the correlation coefficient that determines whether to perform LOOCV or LMOCV calculation cri =0.1 (but this value should be adjusted accordingly as the number of variables increases). After calculation, the results shown in the table below are obtained. This data set has never seen a model with a number o...

Embodiment 2

[0039] According to the literature (Yi Xiang, Guo Zongru, thiazolidinediones and aryl keto acid PPAR-γ agonist three-dimensional quantitative structure-activity relationship research. Acta Pharmaceutica Sinica 2001, 36 (4), 262-268.) 58 PPAR- The structure and biological effects of gamma agonists were calculated using the E-Dragon software provided by the Virtual Computational Chemistry Laboratory (VCCLAB) to obtain 1664 molecular structure descriptors, and 814 descriptors were obtained after pre-screening. Utilize VSMVI method screening then, screening parameter is with embodiment 1. Finally, the results shown in the table below are obtained.

[0040]

[0041]

Embodiment 3

[0043] The "Environmental Toxicity Prediction Challenge" training set provided by Dr.Igor V.Tetko was used for variable screening test. The training set includes 644 organic compounds whose structures are represented by 1664 descriptors calculated by the E-Dragon software of the Virtual Computational Chemistry Laboratory (VCCLAB), available at http: / / www.cadaster.eu. / node / 65. The data and 827 descriptors were obtained after variable pre-screening, and the parameters of VSMVI were the same as those in Example 1. Finally, the following results are obtained.

[0044]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a variable selection method for modeling an organic pollutant quantitative structure and activity relationship. The method comprises the following steps of: calculating linear models combined with all single variables and different bivariables, and retaining a certain number of optimal models for the single variables and the bivariables; then sequentially taking out a model from the retained bivariable linear models, and combining two of the variables and each of the rest variables to form a tri-variable model until all the retained bivariable models are processed; comparing the quality of the tri-variable models, and retaining a certain number of optimal tri-variable models; and repeating, and stopping calculation until the number of variables forming the models meets the requirement, wherein the quality of the models is based on an end standard represented by q2 or a root-mean-square deviation (RMSEV) which is calculated by leave-one-out cross validation (LOOCV) or leave-multiple-out cross validation (LMOCV). The theory is simple and can be understood easily and programmed easily; and the method is quick and effective, so that the rationality of variable selection and the stability of the forecast capacity of the models are guaranteed.

Description

technical field [0001] The invention relates to a variable screening method for quantitative structure-activity correlation modeling of organic pollutants, specifically selecting a certain number of n-variable combinations with relatively large interactions from a large number of molecular structure descriptor variables; The basis is to add a variable each time, form (n+1)-variable combinations with all selected n-variable combinations, and select a certain number of (n+1)-variable combinations, and so on until the requirements are met, so that A variable screening method to obtain optimal linear models with different numbers of variables. Background technique [0002] As a computer modeling technique, the Quantitative Structure and Activity Relationship (QSAR) research method of organic pollutants can deeply explore the quantitative change law and causal relationship between the structure of organic pollutants and their harm to the human body and the ecological environment....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00
Inventor 易忠胜刘红艳莫凌云
Owner GUILIN UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products