Document summarizing software harnessing Wikipedia

Krishnan Ramanathan and three co-authors, Document summarization using Wikipedia, a technical report from HP Labs, February 21, 2009.  (Thanks to ResourceShelf.)

Abstract:   Although most of the developing world is likely to first access the Internet through mobile phones, mobile devices are constrained by screen space, bandwidth and limited attention span. Single document summarization techniques have the potential to simplify information consumption on mobile phones by presenting only the most relevant information contained in the document. In this paper we present a language independent single-document summarization method. We map document sentences to semantic concepts in Wikipedia and select sentences for the summary based on the frequency of the mapped-to concepts. Our evaluation on English documents using the ROUGE package indicates our summarization method is competitive with the state of the art in single document summarization.

Comment.  I’ve written a few times about document summarizing software, and how useful it will be when there is more OA literature to sic it on.  But this is the first time I’ve seen any sign that the software could actually use OA literature to guide and improve the summaries, the way statistical machine translation software uses OA literature to guide and improve translations.  Neat. 

There’s a nice positive feedback loop here:  The more OA literature we have, the better this software will work, and the better it works, the more it supports what I call the software strategy for OA by creating new incentives to make even more work OA.