Sage Journals: Discover world-class research

Abstract

Ever-increasing core counts create the need to develop parallel algorithms that avoid closely coupled execution across all cores. We present performance analysis of several parallel asynchronous implementations of Jacobi’s method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 processes on a Cray XE6 supercomputer. We show that the precise implementation details of asynchronous algorithms can strongly affect the resulting performance and convergence behaviour of our solvers in unexpected ways, discuss how our specific implementations could be generalised to other classes of problem, and suggest how existing parallel programming models might be extended to allow asynchronous algorithms to be expressed more easily.

Keywords

Asychronous algorithms Jacobi MPI SHMEM OpenMP performance analysis linear solvers high performance computing

Get full access to this article

View all access options for this article.

References

Bahi

Contassot-Vivier

Couturier

(2003) Coupling dynamic load balancing with asynchronism in iterative algorithms on the computational grid. In: 17th IEEE and ACM international conference on international parallel and distributed processing symposium (IPDPS 2003), pp. 40a. tw-2 Nice, France: IEEE Computer Society Press.

Bahi

Contassot-Vivier

Couturier

(2006) Performance comparison of parallel programming environments for implementing aiac algorithms. Journal of Supercomputing 35(3): 227–244. DOI:10.1007/s11227-006-4667-8 .

Baudet

(1978) Asynchronous iterative methods for multiprocessors. Journal of the Association for Computing Machinery 25(2): 226–244.

Bertsekas

Tsitsiklis

(1989) Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice-Hall.

Bull

Freeman

(1992) Numerical performance of an asynchronous Jacobi iteration. In: Proceedings of the second joint international conference on vector and parallel processing (CONPAR 1992), pp. 361–366.

Chapman

Curtis

Pophale

Poole

Kuehn

Koelbel

Smith

(2010) Introducing OpenSHMEM: SHMEM for the PGAS community. In: Proceedings of the fourth conference on partitioned global address space programming model (PGAS 2010), New York: ACM, pp. 2:1–2:3. DOI:10.1145/2020373.2020375.

Charr

Couturier

Laiymani

(2012) Adaptation and evaluation of the multisplitting-Newton and waveform relaxation methods over distributed volatile environments. International Journal of Parallel Programming 40: 164–183. DOI:10.1007/s10766-011-0174-5 .

Chazan

Miranker

(1969) Chaotic relaxation. Linear Algebra and Its Applications 2: 199–222.

Cray Inc. (2011) intro_shmem man pages. Available at: http://docs.cray.com.

10.

de Jager

Bradley

(2010) Extracting state-based performance metrics using asynchronous iterative techniques. Performance Evaluation 67(12): 1353–1372.

11.

Fagan

Curtis

Dobson

Karunanayake

Kupczik

Moazen

Page

Phillips

O’Higgins

(2007) Voxel-based finite element analysis: Working directly with microCT scan data. Journal of Morphology 268: 1071.

12.

Frommer

Szyld

(2000) On asynchronous iterations. Journal of Computational and Applied Mathematics 123: 201–216.

13.

Hoefler

Kambadur

Graham

Shipman

Lumsdaine

(2007) A case for standard non-blocking collective operations. In: Proceedings, Euro PVM/MPI. Paris, France.

14.

Hoemmen

(2010) Communication-avoiding Krylov subspace methods. Ph.D. Thesis, University of California, Berkeley, CA.

15.

Maynard

(2012) Comparing one-sided communication with MPI, UPC and SHMEM. In: Proceedings of the Cray User Group (CUG) 2012.

16.

MPI Forum (2009) MPI: A message-passing interface standard. Version 2.2. Available at: http://www.mpi-forum.org .

17.

OpenMP Architecture Review Board (2011) OpenMP application program interface. Available at: http://www.openmp.org .

18.

Saad

(2003) Iterative Methods for Sparse Linear Systems. 2nd edition. Philadelphia, PA: SIAM.

19.

SHM (2012) OpenSHMEM application programming interface. Version 1.0 Available at: http://www.openshmem.org/ .

20.

Chung

Moreira

(2006) Topology mapping for blue gene/l supercomputer. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing (SC 2006). New York: ACM. DOI:10.1145/1188455.1188576.

Performance analysis of asynchronous Jacobi’s method implemented in MPI,SHMEM and OpenMP

Abstract

Keywords

Get full access to this article

References