CUDA

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs.

The cuDNN library, a GPU-accelerated library for Deep Neural Networks, is also installed.

Documentation

Introduction to CUDA, from the NVIDIA website
CUDA documentation from the NVIDIA website
Introduction to nvprof, the command line profiler, from the NVIDIA website
CUDA-MEMCHECK, a suite of tools to diagnose functional correctness
cuDNN documentation, from the NVIDIA website

Usage on Bridges-2

To see what versions of CUDA or cuDNN are available and if there is more than one, which is the default, along with some help, type

module spider cuda
module spider cudnn

To use CUDA, include a command like this in your batch script or interactive session to load the CUDA or cuDNN module: (note ‘module load’ is case-sensitive):

module load cuda
module load cudnn

Profiling your code

To profile your CUDA code, use the command line profiler nvprof, which comes with the CUDA Toolkit. More information on nvprof can be found on the NVIDIA web site (see Documentation above).

Common Errors

Many errors using CUDA are caused by using an outdated version. Try loading the latest version of CUDA with

module load cuda

rather than specifying a specific module.

An error like:

The application being profiled received a signal

can indicate that the code being profiled is incorrect. Some things to check are:

Memory errors. Try cuda-memcheck. More information on CUDA-MEMCHECK can be found on the NVIDIA web site (see Documentation above).
DeviceReset or exit calls; these can hinder writing profile logs