Semi-Markov models for sequence segmentation

Qinfeng Shi*, Yasemin Altun, Alex Smola, S. V.N. Vishwanathan

*Corresponding author for this work

    Research output: Contribution to conferencePaperpeer-review

    7 Citations (Scopus)

    Abstract

    In this paper, we study the problem of automatically segmenting written text into paragraphs. This is inherently a sequence labeling problem, however, previous approaches ignore this dependency. We propose a novel approach for automatic paragraph segmentation, namely training Semi-Markov models discriminatively using a Max-Margin method. This method allows us to model the sequential nature of the problem and to incorporate features of a whole paragraph, such as paragraph coherence which cannot be used in previous models. Experimental evaluation on four text corpora shows improvement over the previous state-of-the art method on this task.

    Original languageEnglish
    Pages640-648
    Number of pages9
    Publication statusPublished - 2007
    Event2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 - Prague, Czech Republic
    Duration: 28 Jun 200728 Jun 2007

    Conference

    Conference2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007
    Country/TerritoryCzech Republic
    CityPrague
    Period28/06/0728/06/07

    Cite this