Comparison of cutoff strategies for geometrical features in machine learning-based scoring functions

Shirley W.I. Siu, Thomas K.F. Wong, Simon Fong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Countings of protein-ligand contacts are popular geometrical features in scoring functions for structure-based drug design. When extracting features, cutoff values are used to define the range of distances within which a protein-ligand atom pair is considered as in contact. But effects of the number of ranges and the choice of cutoff values on the predictive ability of scoring functions are unclear. Here, we compare five cutoff strategies (one-, two-, three-, six-range and soft boundary) with four machine learning methods. Prediction models are constructed using the latest PDBbind v2012 data sets and assessed by correlation coefficients. Our results show that the optimal one-range cutoff value lies between 6 and 8 Å instead of the customary choice of 12 Å. In general, two-range models have improved predictive performance in correlation coefficients by 3-5%, but introducing more cutoff ranges do not always help improving the prediction accuracy.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 9th International Conference, ADMA 2013, Proceedings
Pages336-347
Number of pages12
EditionPART 2
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event9th International Conference on Advanced Data Mining and Applications, ADMA 2013 - Hangzhou, China
Duration: 14 Dec 201316 Dec 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume8347 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Advanced Data Mining and Applications, ADMA 2013
Country/TerritoryChina
CityHangzhou
Period14/12/1316/12/13

Fingerprint

Dive into the research topics of 'Comparison of cutoff strategies for geometrical features in machine learning-based scoring functions'. Together they form a unique fingerprint.

Cite this