Abstract
In this paper, we study the problem of automatically segmenting written text into paragraphs. This is inherently a sequence labeling problem, however, previous approaches ignore this dependency. We propose a novel approach for automatic paragraph segmentation, namely training Semi-Markov models discriminatively using a Max-Margin method. This method allows us to model the sequential nature of the problem and to incorporate features of a whole paragraph, such as paragraph coherence which cannot be used in previous models. Experimental evaluation on four text corpora shows improvement over the previous state-of-the art method on this task.
| Original language | English |
|---|---|
| Pages | 640-648 |
| Number of pages | 9 |
| Publication status | Published - 2007 |
| Event | 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 - Prague, Czech Republic Duration: 28 Jun 2007 → 28 Jun 2007 |
Conference
| Conference | 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 |
|---|---|
| Country/Territory | Czech Republic |
| City | Prague |
| Period | 28/06/07 → 28/06/07 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver