Child pages
  • Running NVidia's Cuda and OpenCL
Skip to end of metadata
Go to start of metadata

Running NVidia's Cuda and OpenCl

These are more or less generic instructions on running NVidia's Cuda/OpenCL tools under any linux environment but sometimes I use miranda's file paths. Specific instructions for course server can be found here .

Compiling and Linking

  • Cuda code is compiled with nvcc compiler and OpenCL code with ordinary system compiler, gcc.
  • To compile NVidia's ptx intermediate code (The example is the vec code from introduction demo)
    nvcc --ptx
    This produces a vec.ptx file to current working directory
  • There is no similar command line utility in OpenCL to produce the intermediate code but there is a way to do it through the OpenCL C API.

GDB Debugger

There is a good manual on NVIdia's site with lots of examples.

You have to compile with the following flags (using the intro's vec example)

nvcc -g -G -o vec vec.c

Once compiled you can execute cuda gdb and set a breakpoint to VecAdd kernel function, switch between threads and examine values

(cuda-gdb) break VecAdd
Breakpoint 1 at 0x805621e: file, line 14.
(cuda-gdb) run
Starting program: /home/tlilja/gpgpu/gpgpu/intro/example/cuda/vec
Breakpoint 1, VecAdd (A=0x210000, B=0x220000, C=0x230000) at
(cuda-gdb) info cuda threads
<<<(0,0),(0,0,0)>>> ... <<<(0,0),(31,0,0)>>> VecAdd (A=0x210000, B=0x220000, C=0x230000) at
(cuda-gdb) p i
$1 = 0
(cuda-gdb) thread <<<5,0,0>>>
Switching to <<<(0,0),(5,0,0)>>> VecAdd (A=0x210000, B=0x220000, C=0x230000) at
(cuda-gdb) p i
$3 = 5

The command to switch threads is

(cuda-gdb) thread [thread-description}

and to get various information on the device state you can use

(cuda-gdb) info cuda [parameter]

where [parameter] is threads, system, device, sm, warp or lane.

Note! The debugger halts the graphics card so you cannot use X while you do debugging unless you have multiple cards.

There is some "support" for concurrent execution: If two Cuda programs are concurrently executed and one is stopped at debugger, the other's kernel launch seems to fail.

Device emulation

  • To compile using the NVidia's device emulation execution:
    nvcc -deviceemu
    Now the code is executed in the host and you can use printf statements or ordinary debugger. Note that the execution order and floating point results may differ.
  • NVidias OpenCL implementation does not have a device emulation mod


  • To enable profiling in Cuda set the environment variable and execute your program
    export CUDA_PROFILE=1
    This will create a text file called cuda_profile_0.log on my Cuda 3.0 environment with has the profiling information
    # CUDA_DEVICE 0 Quadro NVS 135M
    # TIMESTAMPFACTOR fffff73b14670af0
    method=[ memcpyHtoD ] gputime=[ 6.048 ] cputime=[ 9.000 ]
    method=[ memcpyHtoD ] gputime=[ 5.664 ] cputime=[ 6.000 ]
  • In OpenCL:
    export OPENCL_PROFILE=1
    and you get opencl_profile_0.log in the current working directory.

Visual profiler

  • Cuda's visual profiler is in $CUDA_TOOLKIT_PATH/cudaprof/bin/ which you can execute by saying:
    There you have to navigate to Session -> Session Settings and set the binary name Launch and arguments Arguments.
  • There is a html manual in $CUDA_TOOLKIT_PATH/cudaprof/doc (in miranda: /usr/local/cudaprof/doc).
  • For OpenCL the corresponding path is $CUDA_TOOLKIT_PATH/openclprof and the binary
    if you are running Cuda Toolkit 3.0. (Older versions didn't come with openclprof installed?)
  • The manual is located in $CUDA_TOOLKIT_PATH/openclprof/doc (in miranda: /usr/local/cuda/openclprof/doc)
  • For your own installation: cudaprof uses a specific version of the Qt library which may not match your system's version which can cause program crashing with an unresolved symbol. You can fix this by making the binary use the shipped Qt libraries:
  • There seems to be no OpenCL visual profiler for 64-bit platform in Cuda 3.0 beta Ubuntu packages. In 32-bit Ubuntu package profilers for both Cuda and OpenCL have been provided.


The Cuda Toolkit libraries:






Cuda basic system

cuda.h + others

Cuda library documentation (version 2.3)


OpenCL basic system


OpenCL manual pages


Cuda implementatin of BLAS


manual pages locally installed


FFT library


manual pages locally installed

In miranda all Cuda toolkit manual pages are under /usr/local/cuda/man/, toolkit include files are in /usr/local/cuda/include and libraries in /usr/local/cuda/lib64 and /usr/local/cuda/lib (for 32-bit compatibility). OpenCL library is in /usr/lib and headers in /usr/include/CL

SDK libraries ($S = SDK Path = /usr/local/gpu-computing-sdk-3.0 in miranda):





Cuda SDK C utilities



OpenCL Cuda SDK C utilities



Shared Utilities Library



Data parallel Primitives Library


All of the SDK libraries are more or less documented in the header and source files.

  • No labels