Machine learning for readability of legislative sentences

Michael Curtotti, Eric McCreath, Tom Bruce, Sara Frug, Wayne Weibel, Nicolas Ceynowa

    Research output: Chapter in Book/Report/Conference proceedingConference Paperpeer-review

    13 Citations (SciVal)

    Abstract

    Improving the readability of legislation is an important and unresolved problem. Recently, researchers have begun to apply legal informatics to this problem. This paper applies machine learning to predict the readability of sentences from legislation and regulations. A corpus of sentences from the United States Code and US Code of Federal Regulations was created. Each sentence was labelled for language difficulty using results from a large-scale crowdsourced study undertaken during 2014. The corpus was used as training and test data for machine learning. The corpus includes a version tagged using the Stanford parser context free grammar and a version tagged using the Stanford dependency grammar parser. The corpus is described and made available to interested researchers. We investigated whether extending natural language features available as input to machine learning improves the accuracy of prediction. Among features evaluated are those from the context free and dependency grammars. Letter and word ngrams were also studied. We found the addition of such features improves accuracy of prediction on legal language. We also undertake a correlation study of natural language features and language difficulty drawing insights as to the characteristics that may make legal language more difficult. These insights, and those from machine learning, enable us to describe a system for reducing legal language difficulty and to identify a number of suggested heuristics for improving the writing of legislation and regulations.

    Original languageEnglish
    Title of host publication15th International Conference on Artificial Intelligence and Law - Proceedings
    PublisherAssociation for Computing Machinery (ACM)
    Pages53-62
    Number of pages10
    ISBN (Electronic)9781450335225
    DOIs
    Publication statusPublished - 8 Jun 2015
    Event15th International Conference on Artificial Intelligence and Law, ICAIL 2015 - San Diego, United States
    Duration: 8 Jun 201512 Jun 2015

    Publication series

    NameProceedings of the International Conference on Artificial Intelligence and Law
    Volume08-12-June-2015

    Conference

    Conference15th International Conference on Artificial Intelligence and Law, ICAIL 2015
    Country/TerritoryUnited States
    CitySan Diego
    Period8/06/1512/06/15

    Fingerprint

    Dive into the research topics of 'Machine learning for readability of legislative sentences'. Together they form a unique fingerprint.

    Cite this