Abstract
This article extends recent work on the application of computational linguistics to the analysis of poetry. The dataset consisted of 85 canonical English poems and a matched control group of obscure poems. I used Linguistic Inquiry and Word Count to create more than 65 linguistic variables and then used machine learning to develop a classifier designed to distinguish between the canonical (highly anthologized) poems and the obscure (seldom anthologized) poems. The classifier consists of 6 variables and has an accuracy of 69% in distinguishing between canonical and obscure poems. I then ranked the poems using the probability scores of the classifier and found that Blake's A Poison Tree scored highest. I explain the ranking method as being a means of distilling the "literary" appeal from the "popular" appeal of the poems in the sample. Finally, I discuss the implications for the theory of poetry in general.
Original language | English |
---|---|
Pages (from-to) | 103-125 |
Number of pages | 23 |
Journal | Empirical Studies of the Arts |
Volume | 34 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2016 |