The Importance of Length Normalization for XML Retrieval

Authors: Kamps, Jaap1; Rijke, Maarten2; Sigurbjörnsson, Börkur3

Source: Information Retrieval, Volume 8, Number 4, December 2005 , pp. 631-654(24)

Publisher: Springer

Buy & download fulltext article:

OR

Price: $47.00 plus tax (Refund Policy)

Abstract:

XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of element length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length-bias introduced by the amount of smoothing, and show the importance of extreme length bias for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate element length normalization. Even after restricting the minimal size of XML elements occurring in the index, the importance of an extreme explicit length bias remains.

Keywords: XML retrieval; language models; length normalization; smoothing

Document Type: Research article

DOI: http://dx.doi.org/10.1007/s10791-005-0750-7

Affiliations: 1: Informatics Institute, University of Amsterdam, Amsterdam, Email: kamps@science.uva.nl 2: Informatics Institute, University of Amsterdam, Amsterdam, Email: mdr@science.uva.nl 3: Informatics Institute, University of Amsterdam, Amsterdam, Email: borkur@science.uva.nl

Publication date: 2005-12-01

Related content

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page