Skip to main content

Empirical challenges and solutions in constructing a high-performance metasearch engine

Buy Article:

$54.08 plus tax (Refund Policy)

Abstract:

Purpose ‐ This paper seeks to disclose the important role of missing documents, broken links and duplicate items in the results merging process of a metasearch engine in detail. It aims to investigate some related practical challenges and proposes some solutions. The study also aims to employ these solutions to improve an existing model for results aggregation. Design/methodology/approach ‐ This research measures the amount of an increase in retrieval effectiveness of an existing results merging model that is obtained as a result of the proposed improvements. The 50 queries of the 2002 TREC web track were employed as a standard test collection based on a snapshot of the worldwide web to explore and evaluate the retrieval effectiveness of the suggested method. Three popular web search engines (Ask, Bing and Google) as the underlying resources of metasearch engines were selected. Each of the 50 queries was passed to all three search engines. For each query the top ten non-sponsored results of each search engine were retrieved. The returned result lists of the search engines were aggregated using a proposed algorithm that takes the practical issues of the process into consideration. The effectiveness of the result lists generated was measured using a well-known performance indicator called "TSAP" (TREC-style average precision). Findings ‐ Experimental results demonstrate that the proposed model increases the performance of an existing results merging system by 14.39 percent on average. Practical implications ‐ The findings of this research would be helpful for metasearch engine designers as well as providing motivation to the vendors of web search engines to improve their technology. Originality/value ‐ This study provides some valuable concepts, practical challenges, solutions and experimental results in the field of web metasearching that have not been previously investigated.

Keywords: Broken links; Data fusion; Duplicate documents; Information retrieval; Information searches; Metasearch; Missing documents; OWA operator; Rank aggregation; Searching

Document Type: Research Article

DOI: https://doi.org/10.1108/14684521211275993

Publication date: 2012-09-21

  • Access Key
  • Free ContentFree content
  • Partial Free ContentPartial Free content
  • New ContentNew content
  • Open Access ContentOpen access content
  • Partial Open Access ContentPartial Open access content
  • Subscribed ContentSubscribed content
  • Partial Subscribed ContentPartial Subscribed content
  • Free Trial ContentFree trial content
Cookie Policy
X
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more