Leave-multiple-out cross validation (LMOCV) method of quantitative structure and activity relationship (QSAR) model of organic pollutant
A technology of organic pollutants and quantitative structure, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of lack of representativeness and uniform distribution of verification samples, achieve large sample volatility, improve variable screening Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0026] When the number of samples is 31, a 32-level uniform design table is constructed using the grid point method, as shown in Table 1.
[0027] Table 1 The 32-level uniform design table constructed by the grid point method
[0028]
[0029] It can be seen from Table 1 that the 32-level uniform table has a total of 16 columns and 32 rows, of which the elements in the last row are all 32. After deletion, the remaining 31 rows correspond to the sample numbers of 31 samples. Each column represents a Sample distribution form. Divide each column into 5 equal parts, the easiest way is to divide according to the order of row numbers, and use the same division method for all columns. The samples obtained by the uniform design are very evenly distributed throughout the space, while the sample distribution obtained by the Monte Carlo method is not uniform, which is the advantage of the uniform design to obtain the LMOCV grouping method.
Embodiment 2
[0031] Literature (Cronin M.T.D., Netzeva T.I., Dearden J.C., Edwards R., Worgan A.D.P. Assessment and Modeling of the Toxicity of Organic Chemicals to Chlorella vulgaris: Development of A Novel Database. Chem. Res. Toxicol 2004, 17(4), 545- The best model for 91 samples in 554.) has 3 structural descriptors Kow, LUMO and Δ 1 x v As a variable, the correlation coefficient of the model is r 2 = 0.890, q of LOOCV 2 = 0.875.
[0032] Use the method of the present invention to implement UDOLMOCV to this model: first construct the uniform design table of 92 levels, then delete the last row, there are 44 columns in total, and then each column is divided into 2, 5, 10 equal parts (if not divisible, redundant samples are returned to into the last group), which constitutes 44 times of 2-, 5-, and 10-fold cross-validation (denoted by UD-2, -5, and -10, respectively). The calculation results are shown in Table 2. As can be seen from Table 2, 2-, 5-, 10-fold UDOLMOCV The root mean ...
Embodiment 3
[0036] Literature (Liu H., Papa E., Gramatica P. QSAR Prediction of Estrogen Activity for A Large Set of Diverse Chemicals under the Guidance of OEC
[0037] Use the method of the present invention to implement UDOLMOCV to this model: first construct the uniform design table of 133 levels, then delete the last line, there are 108 columns in total, then each column is divided into 2, 5, 10 equal parts (if not divisible, redundant samples are returned to into the last group), which constitutes 108 times of 2-, 5-, and 10-fold cross-validation (denoted by UD-2, -5, and -10, respectively). From the calculation results in Table 3, it can be seen that the root mean square error obtained by UDOLMOCV is always larger than the Monte Carlo cross-validation result, which shows that the sample grouping method adopted by the present invention is more representative. When the sample disturbance is relatively large (such as 2-fold, the stability of the model is significantly reduced, and the...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com