Child pages
  • talk matrix multiplication
Skip to end of metadata
Go to start of metadata

Optimizing Matrix Multiplication with CUDA

The slides used in the presentation can be found here.

Most of the material used in this presentation was from the nVidia CUDA Programming Guide and the nVidia CUDA SDK example software.

The code

The code for mesauring the runtimes can be found here.

The Makefile for the GPU part is still in CUDA SDK format, requiring that the directory is placed in SDK_ROOT/C/src/ and that the required libraries are built (by running make in SDK_ROOT/C/). The executable will be saved in SDK_ROOT/C/bin/linux/release/.

CUDA code taken from the programming guide and SDK software examples matrixMul and simpleCUBLAS.

About CUBLAS performance

It seems that the sample code for using CUBLAS which I used wrote to device memory using vector specific calls cublasSetVector() and cublasGetVector(). I just tried running it by using cublasSetMatrix() and cublasGetMatrix(), but still got the interesting faster-if-divisible-by-16 phenomenon. As a clarification the cublas library requires one to first allocate memory using cublasAlloc(), a point at which cublas doesn't know anything about the nature of the data stored there.

The Data

The data obtained from the above code can be found here.

GPU times are in CSV format, and CPU times in CSV with single space as the delimiter (sorry for the inconsistency, just did lazy hacking) . If you are wondering about the times.plot file, it is the fileformat for Plot, a nice little plotting app for OS X.

1 Comment

  1. I've run the benchmarks on my own computer, which is a Core 2 Quad 2.8GHz (6MB cache), 4GB memory, GeForce XFX GTX 260/216 (which means that it's the newer model with 216 shader processors, old one had 192, and factory overclocked from 576 to 666MHz bringing it to GTX 280 levels).

     Naive...
    100 2.467000 2.461000 2.462000 2.507000 2.461000
    110 3.273000 3.276000 3.273000 3.272000 3.311000
    120 4.241000 4.242000 4.238000 4.237000 4.264000
    130 5.415000 5.417000 5.438000 5.407000 5.410000
    140 6.746000 6.742000 6.746000 6.750000 6.742000
    150 8.280000 8.296000 8.274000 8.277000 8.272000
    160 10.059000 10.074000 10.052000 10.055000 10.050000
    170 11.966000 11.985000 11.965000 11.959000 11.963000
    180 14.270000 14.236000 14.242000 14.235000 14.237000
    190 16.759001 16.672001 16.677000 16.671000 16.672001
    200 19.507999 19.457001 19.452999 19.452999 20.809000
    300 65.289001 65.080002 65.373001 65.031998 65.073997
    400 154.050003 153.744003 153.666000 153.647995 153.703003
    500 309.071014 308.334991 308.075012 308.118988 308.035004
    600 629.711975 625.588989 625.440002 625.398010 625.672974
    700 1064.933960 1064.766968 1064.344971 1064.673950 1064.391968
    800 1722.051025 1721.938965 1722.266968 1722.379028 1721.770996
    900 2823.263916 2782.092041 2783.337891 2780.478027 2781.179932
    1000 4298.341797 4306.356934 4234.847168 4241.624023 4236.126953
    ITPP...
    100 0.863000 0.864000 0.873000 0.863000 0.863000
    110 1.164000 1.164000 1.163000 1.165000 1.176000
    120 1.525000 1.479000 1.479000 1.479000 1.479000
    130 2.043000 1.985000 1.985000 1.985000 1.991000
    140 2.499000 2.442000 2.447000 2.442000 2.443000
    150 3.067000 3.006000 3.005000 3.008000 3.005000
    160 3.579000 3.518000 3.514000 3.520000 3.515000
    170 4.438000 4.368000 4.375000 4.369000 4.372000
    180 5.127000 5.055000 5.052000 5.058000 5.051000
    190 6.041000 5.968000 5.964000 5.961000 5.963000
    200 6.999000 6.908000 6.908000 6.910000 6.909000
    300 22.908001 22.517000 22.514999 22.511000 22.509001
    400 52.813999 52.387001 52.061001 52.039001 52.012001
    500 104.390999 104.261002 104.249001 104.240997 104.248001
    600 180.348007 180.272995 180.285995 180.554993 180.289993
    700 286.153015 283.433990 285.593994 288.675995 286.319000
    800 448.069000 446.858002 446.658997 452.514008 444.785004
    900 801.635010 766.737000 767.760010 767.117981 766.403992
    1000 1180.197021 1193.953003 1190.433960 1195.473999 1191.301025
    1200 2130.720947 2127.923096 2127.308105 2128.092041 2126.974121
    1400 3392.510986 3427.361084 3427.177002 3427.502930 3427.329102
    1600 5107.655762 5046.551758 5043.884766 5044.526855 5043.621094
    1800 7193.706055 7273.669922 7330.723145 7270.430176 7271.759766
    2000 9862.413086 9912.724609 9852.471680 9851.615234 9853.087891
    2200 13472.436523 13414.885742 13415.912109 13472.472656 13415.082031
    2400 17566.101562 17620.773438 17564.802734 17623.472656 17568.957031
    2600 22390.792969 22333.837891 22390.445312 22333.078125 22391.919922
    2800 27950.812500 27881.914062 27950.654297 27992.816406 27904.349609
    3000 34277.652344 34300.406250 34357.386719 34303.984375 34330.957031
    CUDA non shared...
    100 0.125000 0.125000 0.124000 0.125000 0.125000
    110 0.157000 0.155000 0.155000 0.155000 0.157000
    120 0.195000 0.195000 0.193000 0.195000 0.194000
    130 0.241000 0.240000 0.241000 0.239000 0.236000
    140 0.306000 0.305000 0.307000 0.306000 0.306000
    150 0.474000 0.467000 0.473000 0.472000 0.474000
    160 0.447000 0.448000 0.447000 0.447000 0.446000
    170 0.523000 0.521000 0.519000 0.521000 0.522000
    180 0.702000 0.704000 0.702000 0.703000 0.706000
    190 0.739000 0.744000 0.743000 0.744000 0.743000
    200 0.853000 0.854000 0.853000 0.853000 0.844000
    300 3.792000 3.816000 3.816000 3.753000 3.773000
    400 6.651000 6.654000 6.651000 6.659000 6.663000
    500 13.193000 13.191000 13.202000 13.212000 13.235000
    600 29.106001 28.676001 29.044001 28.955999 29.097000
    700 37.664001 37.580002 37.672001 37.526001 36.883999
    800 56.972000 56.939999 57.008999 56.984001 57.167000
    900 139.578995 148.087006 151.580994 142.227005 140.289001
    1000 115.460999 115.133003 115.420998 115.258003 115.320999
    1200 214.054993 211.936005 210.720001 217.425003 211.378006
    1400 338.726990 342.184998 339.776001 339.536987 326.697998
    1600 480.338989 480.894989 479.868988 481.582001 480.975006
    1800 810.752991 804.603027 807.843994 808.973022 798.968018
    2000 978.632996 978.429993 980.583008 969.846985 979.245972
    2200 1257.534058 1257.551025 1263.432983 1258.725952 1258.541992
    2400 1738.229980 1737.204956 1740.125000 1741.020020 1736.786987
    2600 2155.162109 2148.437012 2141.590088 2155.395996 2158.152100
    2800 2737.189941 2735.395996 2734.606934 2738.039062 2736.074951
    3000 3342.281982 3336.343018 3345.222900 3345.654053 3348.684082
    3500 5406.798828 5393.306152 5380.390137 5385.210938 5396.744141
    CUDA Shared...
    100 0.053000 0.053000 0.053000 0.052000 0.053000
    110 0.061000 0.061000 0.060000 0.061000 0.063000
    120 0.072000 0.072000 0.072000 0.072000 0.072000
    130 0.952000 0.082000 0.083000 0.082000 0.083000
    140 0.096000 0.101000 0.097000 0.097000 0.096000
    150 0.123000 0.123000 0.126000 0.125000 0.124000
    160 0.136000 0.135000 0.136000 0.136000 0.134000
    170 0.148000 0.149000 0.149000 0.148000 0.147000
    180 0.167000 0.166000 0.168000 0.166000 0.166000
    190 0.193000 0.194000 0.194000 0.191000 0.194000
    200 0.212000 0.212000 0.213000 0.213000 0.214000
    300 0.740000 0.715000 0.723000 0.736000 0.734000
    400 1.464000 1.465000 1.464000 1.462000 1.467000
    500 2.789000 2.789000 2.791000 2.788000 2.792000
    600 5.117000 5.127000 5.159000 5.226000 5.223000
    700 7.527000 7.526000 7.525000 7.526000 7.527000
    800 11.197000 11.199000 11.202000 11.188000 11.190000
    900 22.922001 23.950001 23.325001 24.033001 24.079000
    1000 21.944000 21.860001 21.951000 21.895000 21.938000
    1200 38.153000 38.217999 38.088001 38.164001 38.056000
    1400 60.474998 60.574001 60.360001 60.227001 60.223000
    1600 92.849998 92.583000 93.013000 92.876999 92.950996
    1800 139.985992 140.451996 141.574997 141.658997 140.580994
    2000 177.089996 177.529999 177.328003 176.990005 177.087997
    2200 229.623993 229.503998 229.587006 229.576004 229.598999
    2400 341.679993 342.946991 344.417999 342.459991 343.272003
    2600 434.380005 433.104004 433.858002 432.322998 432.897003
    2800 549.963013 550.111023 549.640991 549.937012 550.093994
    3000 677.172974 677.703979 676.635986 676.577026 676.275024
    3500 1087.115967 1085.156006 1088.218994 1090.959961 1084.916016
    CUDA Shared w optimized blocksize...
    100 0.053000 0.052000 0.052000 0.051000 0.054000
    110 0.060000 0.062000 0.061000 0.062000 0.060000
    120 0.059000 0.062000 0.061000 0.059000 0.061000
    130 0.060000 0.060000 0.061000 0.060000 0.060000
    140 0.095000 0.097000 0.096000 0.094000 0.095000
    150 0.093000 0.095000 0.095000 0.095000 0.094000
    160 0.109000 0.110000 0.107000 0.109000 0.108000
    170 0.108000 0.108000 0.109000 0.108000 0.108000
    180 0.170000 0.170000 0.171000 0.173000 0.170000
    190 0.171000 0.171000 0.170000 0.171000 0.172000
    200 0.192000 0.191000 0.192000 0.190000 0.193000
    300 0.542000 0.543000 0.543000 0.536000 0.544000
    400 1.211000 1.210000 1.214000 1.211000 1.225000
    500 2.199000 2.183000 2.187000 2.177000 2.178000
    600 4.006000 3.994000 3.992000 3.986000 3.992000
    700 6.006000 5.996000 5.993000 5.998000 6.005000
    800 9.252000 9.252000 9.244000 9.237000 9.242000
    900 13.028000 13.024000 13.017000 13.014000 12.982000
    1000 17.938000 17.966999 17.927000 17.941000 17.955999
    1200 30.952999 30.896000 30.885000 30.884001 30.924000
    1400 48.900002 48.866001 48.903000 48.910000 48.875999
    1600 72.742996 72.684998 72.671997 72.646004 72.737999
    1800 104.101997 103.921997 104.029999 103.921997 103.972000
    2000 141.589005 141.511002 141.533997 141.539993 141.507996
    2200 193.981003 194.063995 194.160004 194.145004 194.119003
    2400 250.992004 251.054993 251.026001 251.063995 250.992996
    2600 318.528015 318.687012 318.712006 318.675995 318.841003
    2800 398.312012 398.253998 398.364990 398.178986 398.403015
    3000 486.372009 486.213013 486.226013 486.251007 486.160004
    3500 776.270996 776.393982 776.278015 776.341003 776.291016
    CUBLAS...
    100 0.109000 0.108000 0.109000 0.108000 0.109000
    110 0.114000 0.115000 0.115000 0.114000 0.115000
    120 0.124000 0.121000 0.121000 0.120000 0.121000
    130 0.105000 0.102000 0.103000 0.104000 0.104000
    140 0.133000 0.107000 0.106000 0.110000 0.116000
    150 0.124000 0.110000 0.109000 0.112000 0.111000
    160 0.114000 0.101000 0.102000 0.100000 0.102000
    170 0.187000 0.172000 0.170000 0.172000 0.172000
    180 0.189000 0.177000 0.177000 0.175000 0.176000
    190 0.198000 0.184000 0.184000 0.185000 0.184000
    200 0.203000 0.188000 0.190000 0.190000 0.190000
    300 0.455000 0.440000 0.440000 0.439000 0.440000
    400 0.917000 0.901000 0.901000 0.903000 0.901000
    500 1.601000 1.581000 1.585000 1.585000 1.580000
    600 2.616000 2.607000 2.606000 2.607000 2.605000
    700 3.938000 3.923000 3.926000 3.921000 3.923000
    800 4.401000 4.372000 4.405000 4.382000 4.408000
    900 8.758000 8.774000 8.778000 8.830000 8.766000
    1000 11.461000 11.446000 11.450000 11.444000 11.444000
    1200 19.687000 19.684000 19.666000 19.674000 19.672001
    1400 30.277000 30.278999 30.266001 30.280001 30.278000
    1600 23.667000 23.667999 23.672001 23.656000 23.667999
    1800 65.456001 65.430000 65.431000 65.426003 65.428001
    2000 88.625999 88.620003 88.598999 88.620003 88.639000
    2200 116.178001 116.194000 116.177002 116.191002 116.188004
    2400 110.600998 110.650002 110.641998 110.639999 110.658997
    2600 192.692001 192.671005 192.675995 192.660995 192.694000
    2800 244.119995 244.274002 244.367004 243.908005 243.679993
    3000 293.071991 293.063995 293.058990 293.897003 293.044006
    3500 469.175995 469.660004 469.709991 469.412994 470.046997
    
    

    Summarising (before I work on a plot) on both CPU and GPU the trivial approaches are faster than on Miranda and the better libraries are slower than on Miranda.