A Distributed Memory Algorithm for Lexicon Building
Author: Hawking, D.
Source: Journal of Parallel and Distributed Computing, Volume 44, Number 1, July 1997 , pp. 80-87(8)
Publisher: Academic Press
Abstract:A parallel algorithm for preparing word frequency concordances over twospecified sets of documents from a collection is presented. Good parallel efficiency is demonstrated on a 128-node distributed memory machine using sets whose combined size exceeds one gigabyte. It is demonstrated that efficiency is heavily influenced by hashing and communication strategies. A two-stage hashing algorithm is proposed to reduce communication overhead. Ways of increasing capacity are considered, and the applicability of the algorithm to other text-processing functions such as index and symbol-table building is outlined.
Document Type: Research Article
Affiliations: Department of Computer Science, Australian National University, Canberra, Australian Capital Territory, 0200, Australia
Publication date: July 1, 1997