Statistical modeling of a ligand knowledge base

Ralph A. Mansson*, Alan H. Welsh, Natalie Fey, A. Guy Orpen

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    30 Citations (Scopus)

    Abstract

    A range of different statistical models has been fitted to experimental data for the Tolman electronic parameter (TEP) based on a large set of calculated descriptors in a prototype ligand knowledge base (LKB) of phosphorus(III) donor ligands. The models have been fitted by ordinary least squares using subsets of descriptors, principal component regression, and partial least squares which use variables derived from the complete set of descriptors, least angle regression, and the least absolute shrinkage and selection operator. None of these methods is robust against outliers, so we also applied a robust estimation procedure to the linear regression model. Criteria for model evaluation and comparison have been discussed, highlighting the importance of resampling methods for assessing the robustness of models and the scope for making predictions in chemically intuitive models. For the ligands covered by this LKB, ordinary least squares models of descriptor subsets provide a good representation of the data, while partial least squares, principal component regression, and least angle regression models are less suitable for our dual aims of prediction and interpretation. A linear regression model with robustly fitted parameters achieves the best model performance over all classes of models fitted to TEP data, and the weightings assigned to ligands during the robust estimation procedure are chemically intuitive. The increased model complexity when compared to the ordinary least squares linear model is justified by the reduced influence of individual ligands on the model parameters and predictions of new ligands. Robust linear regression models therefore represent the best compromise for achieving statistical robustness in simple, chemically meaningful models.

    Original languageEnglish
    Pages (from-to)2591-2600
    Number of pages10
    JournalJournal of Chemical Information and Modeling
    Volume46
    Issue number6
    DOIs
    Publication statusPublished - 2006

    Fingerprint

    Dive into the research topics of 'Statistical modeling of a ligand knowledge base'. Together they form a unique fingerprint.

    Cite this