Sage Journals: Discover world-class research

Abstract

A block-structured adaptive mesh refinement (AMR) technique has been used to obtain numerical solutions for many scientific applications. Some block-structured AMR approaches have focused on forming patches of non-uniform sizes where the size of a patch can be tuned to the geometry of a region of interest. In this paper, we develop strategies for adaptive execution of block-structured AMR applications on GPUs, for hyperbolic directionally split solvers. While effective hybrid execution strategies exist for applications with uniform patches, our work considers efficient execution of non-uniform patches with different workloads. Our techniques include bin-packing work units to load balance GPU computations, adaptive asynchronism between CPU and GPU executions using a knapsack formulation, and scheduling communications for multi-GPU executions. Our experiments with synthetic and real data, for single-GPU and multi-GPU executions, on Tesla S1070 and Fermi C2070 clusters, show that our strategies result in up to a 3.23 speedup in performance over existing strategies.

Keywords

Adaptive mesh refinement GPU executions dynamic load balancing asynchronous executions of CPUs and GPUs coalesced access

Get full access to this article

View all access options for this article.

References

Aluru

Sevilgen

(1997) Parallel domain decomposition and load balancing using space-filling curves. In: Proceedings of the fourth international conference on high-performance computing, pp. 230–235.

Berger

Oliger

(1984) Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics53(3): 484–512.

Blazewicz

Brandt

Diener

, et al. (2012) A massive data parallel computational framework for petascale/exascale hybrid computer systems. In: De Bosschere

, et al. (eds) Applications, Tools and Techniques on the Road to Exascale Computing (Advances in Parallel Computing, vol. 22). Clifton, VA: IOS Press.

Crainic

Perboli

Tadei

(2008) Extreme point-based heuristics for three-dimensional bin packing. INFORMS Journal on Computing20(3): 368–384.

Deiterding

(2005) Detonation structure simulation with AMROC. In: Proceedings of high performance computing and communications international conference, HPCC2005, pp. 916–927.

Fryxell

Olson

Ricker

, et al. (2000) FLASH: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. The Astrophysical Journal Supplement Series131(1): 273.

Garey

Johnson

(1979) Computers and Intractability: A Guide to the Theory of NP-Completeness. New York, USA: W. H. Freeman & Co. ISBN: 0716710447.

Humphrey

Meng

Berzins

, et al. (2012) Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: Proceedings of the first conference of the extreme science and engineering discovery environment (XSEDE‘12).

LeVeque

(2002) Finite Volume Methods for Hyperbolic Problems (Cambridge Texts in Applied Mathematics). Cambridge: Cambridge University Press.

10.

MacNeice

Olson

Mobarry

, et al. (2000) PARAMESH: A parallel adaptive mesh refinement community toolkit. Computer Physics Communications126: 330–354.

11.

Martello

Toth

(1990) Knapsack Problems: Algorithms and Computer Implementations. New York, NY: John Wiley & Sons.

12.

NVIDIA (2014) C Programming Guide Version 6.0. Available at: http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.

13.

Schive

Tsai

Chiueh

(2010) GAMER: A GPU-accelerated adaptive mesh refinement code for astrophysics. Astrophysical Journal Supplement Series186: 457.

14.

Toro

(1992) The weighted average flux method applied to the Euler equations. Philosophical Transactions of the Royal Society of London Series A: Physical and Engineering Sciences341(1662): 499–530.

15.

Trac

Pen

(2003) A primer on Eulerian computational fluid dynamics for astrophysics. Publications of the Astronomical Society of the Pacific115(805): 303–321.

16.

Wang

Abel

Kaehler

(2010) Adaptive mesh fluid simulations on GPU. New Astronomy15(7): 581–589.

17.

Wissink

Hornung

Kohn

, et al. (2001) Large scale parallel structured AMR calculations using the SAMRAI framework. In: Proceedings of the 2001 ACM/IEEE conference on supercomputing (CDROM).

18.

Ziegler

(2008) The NIRVANA code: Parallel computational MHD with adaptive mesh refinement. Computer Physics Communications179(4): 227–244.