High Activity Target-Site Identification Using Phenotypic Independent CRISPR-Cas9 Core Functionality

Laurence Wilson, Daniel Reti, Aidan O'Brien, Robert A. Dunne, Denis C Bauer

    Research output: Contribution to journalArticlepeer-review

    Abstract

    The activity of CRISPR-Cas9 target sites can be measured experimentally through phenotypic assays or mutation rate and used to build computational models to predict activity of novel target sites. However, currently published models have been reported to perform poorly in situations other than their training conditions. In this study, we hence investigate how different sources of data influence predictive power and identify the best data set for the most robust predictive model. We use the activity of 28,606 target sites and a machine learning approach to train a predictive model of CRISPR-Cas9 activity, outperforming other published methods by an average increase in accuracy of 80% for prediction of the degree of activity and 13% for classification into active and inactive categories. We find that using data sets that measure CRISPR-Cas9 activity through sequencing provides more accurate predictions of activity. Our model, dubbed TUSCAN, is highly scalable, predicting the activity of 5000 target sites in under 7s, making it suitable for genome-wide screens. We conclude that sophisticated machine learning methods can classify binary CRISPR-Cas9 activity; however, predicting fine-scale activity scores will require larger data sets directly measuring Indel insertion rate.
    Original languageEnglish
    Pages (from-to)182-190
    JournalThe CRISPR Journal
    Volume1
    Issue number2
    DOIs
    Publication statusPublished - 2018

    Fingerprint

    Dive into the research topics of 'High Activity Target-Site Identification Using Phenotypic Independent CRISPR-Cas9 Core Functionality'. Together they form a unique fingerprint.

    Cite this