Incorporating MAGMA into the 'fields' spatial statistics package

AMS Citation:
Paige, J., I. Lyngaas, V. Ramakrishnaiah, D. Hammerling, R. Kumar, and D. Nychka, 2015: Incorporating MAGMA into the 'fields' spatial statistics package. NCAR Technical Note NCAR/TN-519+STR, 29 pp, doi:10.5065/D6KP8078.
Date:2015-08-01
Resource Type:technical report
Title:Incorporating MAGMA into the 'fields' spatial statistics package
Abstract: In this report we describe how to incorporate the Cholesky decomposition from the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library into some of the calculations of the 'fields' spatial statistics package in R. We provide MAGMA installation instructions as well as demonstrations of performance when applied to simulated datasets and the CO2 dataset available in fields. While there are other spatial statistics packages in R using parallelism, such as bigGP and parspatstat, none to our knowledge directly incorporates GPUs or other coprocessors. Our code is timed on Caldera computational nodes in the National Center for Atmospheric Research's Yellowstone supercomputing environment. We find that for 40,000 x 40,000 matrices the MAGMA-accelerated decomposition has a 30.7 and 46.2 times speedup for 1 and 2 GPU implementations respectively over chol, the standard Cholesky decomposition function in R (with settings allowing R programmers to use our accelerated function like they would chol). The speedups are greater when using in-place calculations where the original matrix is overwritten and not copied. In that case, the equivalent speedups are 41.8 and 54.4 times for in place decompositions on one and two GPUs respectively. We also time a simple spatial analysis workflow with maximum likelihood estimation with up to over 23,000 observations, where accelerated workflows achieved approximately 4.2 and 4.3 times speedup when using 1 and 2 GPUs respectively over a corresponding unaccelerated workflow. As problem size increases, speedups improve, and the 2 GPU decompositions perform increasingly well compared to their corresponding 1 GPU implementations. Performance for 2 GPU decompositions is slower than with 1 GPU in some cases due to additional communication overheads and data dependencies in the Cholesky decomposition algorithm, and will be explored further in Ramakrishnaiah et al. (2015).
Subject(s):Spatial statistics, Kriging, High-performance computing, GPU, MAGMA
Peer Review:Non-refereed
Copyright Information:Copyright Author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
OpenSky citable URL: ark:/85065/d72j6b88
Publisher's Version: 10.5065/D6KP8078
Author(s):
  • John Paige
  • Isaac Lyngaas
  • Vinay Ramakrishnaiah
  • Dorit Hammerling - NCAR/UCAR
  • Raghu Kumar - NCAR/UCAR
  • Doug Nychka - NCAR/UCAR
  • Random Profile

    PROJ SCIENTIST I

    Recent & Upcoming Visitors