Fast Runtime Block Cyclic Data Redistribution on Multiprocessors

The full text article is not available for purchase.

The publisher only permits individual articles to be downloaded by subscribers.


Block cyclic distribution seems to be well suited for most linear algebra algorithms, and this type of data distribution was chosen for the ScaLAPACK library as well as for the HPF language. However, one must choose a good compromise for the size of the blocks (to achieve a good computation and communication efficiency and a good load balancing). This choice heavily depends on each operation, so it is essential to be able to go from one block cyclic distribution to another very quickly. Moreover, it is also essential to be able to choose the right number of processors and the best grid shape for a given operation. We present here the data redistribution algorithms we implemented in the ScaLAPACK library in order to go from one block cyclic distribution on one grid to that on another grid. A complexity study is made that shows the efficiency of our solution. Timing results on the Intel Paragon and the Cray T3D corroborate our results.

Document Type: Short Communication

Affiliations: LIP, URA CNRS 1398, INRIA Rhone-Alpes, ENS-Lyon, Lyon, 69364, France

Publication date: August 1, 1997

Related content



Share Content

Access Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content
Cookie Policy
Cookie Policy
ingentaconnect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more