Timing copperhead from benchmark from within python + Compiling the C++ src generated to profile using nv profiler #11

mohdsm81 · 2013-04-05T19:28:24Z

Hi,

I have been using copperhead for quiet some time now and I don't see a way where users may communicate and get some support, so I will post my questions here so that anyone faces similar issues can refer to this (hope this is ok).

So, as the subject line says:
1- I am using copperhead blackscholes sample as a benchmark. I am trying to time both "memory transfers to and from GPU" + kernel launch and execution time. Source for the what I am doing it follows, am I doing this right according to below source?

SRC

from copperhead import *
import numpy as np
import time

@cu
def cnd(d):
    A1 = 0.31938153
    A2 = -0.356563782
    A3 = 1.781477937
    A4 = -1.821255978
    A5 = 1.330274429
    RSQRT2PI = 0.39894228040143267793994605993438

    K = 1.0 / (1.0 + 0.2316419 * abs(d))

    cnd = RSQRT2PI * exp(- 0.5 * d * d) * \
        (K * (A1 + K * (A2 + K * (A3 + K * (A4 + K * A5)))))

    if d > 0:
        return 1.0 - cnd
    else:
        return cnd


@cu
def black_scholes(S, X, T, R, V):
    def black_scholes_el(si, xi, ti):
        sqrt_ti = sqrt(ti)
        d1 = (log(si/xi) + (R + .5 * V * V) * ti) / (V * sqrt_ti)
        d2 = d1 - V * sqrt_ti
        cnd_d1 = cnd(d1)
        cnd_d2 = cnd(d2)
        exp_Rti = exp(-R * ti)
        call_result = si * cnd_d1 - xi * exp_Rti * cnd_d2;
        put_result = xi * exp_Rti * (1.0 - cnd_d2) - si * (1.0 - cnd_d1)
        return call_result, put_result
    return map(black_scholes_el, S, X, T)

def rand_floats(n, min, max):
    diff = np.float32(max) - np.float32(min)
    rands = np.array(np.random.random(n), dtype=np.float32)
    rands = rands * diff
    rands = rands + np.float32(min)
    return cuarray(rands)

n1 = 16
n2 = 4096 #4k
n3 = 16384 #16k
n4 = 32768 #32k
n5 = 65536 #64k
n6 = 131072 #128k
n7 = 262144 #256k
n8 = 524288 #512k
n9 = 1048576 #1M
n9 = 1048576 #1M
n10 = 2097192 #2M
n11 = n10*2 #4M
n12 = n11*2 #8M
n13 = n12*2 #16M
n14 = n13*2 #32M

n = n14


S = rand_floats(n, 5, 30)
X = rand_floats(n, 1, 100)
T = rand_floats(n, .25, 10)
R = np.float32(.02)
V = np.float32(.3)

start = time.time()
with places.gpu0:
    r = black_scholes(S, X, T, R, V)
end = time.time()



print 'Computed: ', n , ' Options'
print 'Computation Time: ', (end-start)*1000 ,' Milliseconds'

2- the CPP source generated in the pycache after compiling the above, I need to compile it to a regular binary file (just like nvcc xxx.cu -o my_bin) and then run it using nvidia profiler to investigate some more according to timing obtained from "1" above. How exactly can I do that? this is a crucial thing in my benchmarking report so I need to do it or else I may conclude that it can't be done.

Thanks a lot in advance for all the help and time spent to educate me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing copperhead from benchmark from within python + Compiling the C++ src generated to profile using nv profiler #11

Timing copperhead from benchmark from within python + Compiling the C++ src generated to profile using nv profiler #11

mohdsm81 commented Apr 5, 2013

Timing copperhead from benchmark from within python + Compiling the C++ src generated to profile using nv profiler #11

Timing copperhead from benchmark from within python + Compiling the C++ src generated to profile using nv profiler #11

Comments

mohdsm81 commented Apr 5, 2013

SRC