Child pages
  • Running NVidia's Cuda and OpenCL
Skip to end of metadata
Go to start of metadata

Running NVidia's Cuda and OpenCl

These are more or less generic instructions on running NVidia's Cuda/OpenCL tools under any linux environment but sometimes I use miranda's file paths. Specific instructions for course server can be found here .

Compiling and Linking

  • Cuda code is compiled with nvcc compiler and OpenCL code with ordinary system compiler, gcc.
  • To compile NVidia's ptx intermediate code (The example is the vec code from introduction demo)
    nvcc --ptx vec.cu
    
    This produces a vec.ptx file to current working directory
  • There is no similar command line utility in OpenCL to produce the intermediate code but there is a way to do it through the OpenCL C API.

GDB Debugger

There is a good manual on NVIdia's site with lots of examples.

You have to compile with the following flags (using the intro's vec example)

nvcc -g -G -o vec vec.c

Once compiled you can execute cuda gdb and set a breakpoint to VecAdd kernel function, switch between threads and examine values

(cuda-gdb) break VecAdd
Breakpoint 1 at 0x805621e: file vec.cu, line 14.
(cuda-gdb) run
Starting program: /home/tlilja/gpgpu/gpgpu/intro/example/cuda/vec
[...]
Breakpoint 1, VecAdd (A=0x210000, B=0x220000, C=0x230000) at vec.cu:15
(cuda-gdb) info cuda threads
<<<(0,0),(0,0,0)>>> ... <<<(0,0),(31,0,0)>>> VecAdd (A=0x210000, B=0x220000, C=0x230000) at vec.cu:15
(cuda-gdb) p i
$1 = 0
(cuda-gdb) thread <<<5,0,0>>>
Switching to <<<(0,0),(5,0,0)>>> VecAdd (A=0x210000, B=0x220000, C=0x230000) at vec.cu:16
(cuda-gdb) p i
$3 = 5

The command to switch threads is

(cuda-gdb) thread [thread-description}

and to get various information on the device state you can use

(cuda-gdb) info cuda [parameter]

where [parameter] is threads, system, device, sm, warp or lane.

Note! The debugger halts the graphics card so you cannot use X while you do debugging unless you have multiple cards.

There is some "support" for concurrent execution: If two Cuda programs are concurrently executed and one is stopped at debugger, the other's kernel launch seems to fail.

Device emulation

  • To compile using the NVidia's device emulation execution:
    nvcc -deviceemu vec.cu
    
    Now the code is executed in the host and you can use printf statements or ordinary debugger. Note that the execution order and floating point results may differ.
  • NVidias OpenCL implementation does not have a device emulation mod

Profiler

  • To enable profiling in Cuda set the environment variable and execute your program
    export CUDA_PROFILE=1
    ./vec
    
    This will create a text file called cuda_profile_0.log on my Cuda 3.0 environment with has the profiling information
    # CUDA_PROFILE_LOG_VERSION 1.6
    # CUDA_DEVICE 0 Quadro NVS 135M
    # TIMESTAMPFACTOR fffff73b14670af0
    method,gputime,cputime,occupancy
    method=[ memcpyHtoD ] gputime=[ 6.048 ] cputime=[ 9.000 ]
    method=[ memcpyHtoD ] gputime=[ 5.664 ] cputime=[ 6.000 ]
    ...
    
  • In OpenCL:
    export OPENCL_PROFILE=1
    ./vec
    
    and you get opencl_profile_0.log in the current working directory.

Visual profiler

  • Cuda's visual profiler is in $CUDA_TOOLKIT_PATH/cudaprof/bin/ which you can execute by saying:
    cudaprof
    
    There you have to navigate to Session -> Session Settings and set the binary name Launch and arguments Arguments.
  • There is a html manual in $CUDA_TOOLKIT_PATH/cudaprof/doc (in miranda: /usr/local/cudaprof/doc).
  • For OpenCL the corresponding path is $CUDA_TOOLKIT_PATH/openclprof and the binary
    openclprof
    
    if you are running Cuda Toolkit 3.0. (Older versions didn't come with openclprof installed?)
  • The manual is located in $CUDA_TOOLKIT_PATH/openclprof/doc (in miranda: /usr/local/cuda/openclprof/doc)
  • For your own installation: cudaprof uses a specific version of the Qt library which may not match your system's version which can cause program crashing with an unresolved symbol. You can fix this by making the binary use the shipped Qt libraries:
    LD_LIBRARY_PATH=/path/to/cudaprof/bin:$LD_LIBRARY_PATH
    
  • There seems to be no OpenCL visual profiler for 64-bit platform in Cuda 3.0 beta Ubuntu packages. In 32-bit Ubuntu package profilers for both Cuda and OpenCL have been provided.

Libraries

The Cuda Toolkit libraries:

Library

Description

Headers

Documentation

libcudart

Cuda basic system

cuda.h + others

Cuda library documentation (version 2.3)

libOpenCL

OpenCL basic system

cl.h,cl_gl.h,cl_platform.h

OpenCL manual pages

libcublas

Cuda implementatin of BLAS

cublas.h

manual pages locally installed

libcufft

FFT library

cufft.h

manual pages locally installed

In miranda all Cuda toolkit manual pages are under /usr/local/cuda/man/, toolkit include files are in /usr/local/cuda/include and libraries in /usr/local/cuda/lib64 and /usr/local/cuda/lib (for 32-bit compatibility). OpenCL library is in /usr/lib and headers in /usr/include/CL

SDK libraries ($S = SDK Path = /usr/local/gpu-computing-sdk-3.0 in miranda):

Library

Description

Headers

libcutil

Cuda SDK C utilities

$S/C/common/inc/cutil*.h

liboclUtil

OpenCL Cuda SDK C utilities

$S/OpenCL/common/inc/oclutils.h

libshrutil

Shared Utilities Library

$S/shared/inc/shrUtils.h

libcudpp

Data parallel Primitives Library

$S/shared/inc/cudpp/cudpp.h

All of the SDK libraries are more or less documented in the header and source files.

  • No labels