Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing copperhead from benchmark from within python + Compiling the C++ src generated to profile using nv profiler #11

Open
mohdsm81 opened this issue Apr 5, 2013 · 0 comments

Comments

@mohdsm81
Copy link

mohdsm81 commented Apr 5, 2013

Hi,

I have been using copperhead for quiet some time now and I don't see a way where users may communicate and get some support, so I will post my questions here so that anyone faces similar issues can refer to this (hope this is ok).

So, as the subject line says:
1- I am using copperhead blackscholes sample as a benchmark. I am trying to time both "memory transfers to and from GPU" + kernel launch and execution time. Source for the what I am doing it follows, am I doing this right according to below source?

SRC

from copperhead import *
import numpy as np
import time

@cu
def cnd(d):
    A1 = 0.31938153
    A2 = -0.356563782
    A3 = 1.781477937
    A4 = -1.821255978
    A5 = 1.330274429
    RSQRT2PI = 0.39894228040143267793994605993438

    K = 1.0 / (1.0 + 0.2316419 * abs(d))

    cnd = RSQRT2PI * exp(- 0.5 * d * d) * \
        (K * (A1 + K * (A2 + K * (A3 + K * (A4 + K * A5)))))

    if d > 0:
        return 1.0 - cnd
    else:
        return cnd


@cu
def black_scholes(S, X, T, R, V):
    def black_scholes_el(si, xi, ti):
        sqrt_ti = sqrt(ti)
        d1 = (log(si/xi) + (R + .5 * V * V) * ti) / (V * sqrt_ti)
        d2 = d1 - V * sqrt_ti
        cnd_d1 = cnd(d1)
        cnd_d2 = cnd(d2)
        exp_Rti = exp(-R * ti)
        call_result = si * cnd_d1 - xi * exp_Rti * cnd_d2;
        put_result = xi * exp_Rti * (1.0 - cnd_d2) - si * (1.0 - cnd_d1)
        return call_result, put_result
    return map(black_scholes_el, S, X, T)

def rand_floats(n, min, max):
    diff = np.float32(max) - np.float32(min)
    rands = np.array(np.random.random(n), dtype=np.float32)
    rands = rands * diff
    rands = rands + np.float32(min)
    return cuarray(rands)

n1 = 16
n2 = 4096 #4k
n3 = 16384 #16k
n4 = 32768 #32k
n5 = 65536 #64k
n6 = 131072 #128k
n7 = 262144 #256k
n8 = 524288 #512k
n9 = 1048576 #1M
n9 = 1048576 #1M
n10 = 2097192 #2M
n11 = n10*2 #4M
n12 = n11*2 #8M
n13 = n12*2 #16M
n14 = n13*2 #32M

n = n14


S = rand_floats(n, 5, 30)
X = rand_floats(n, 1, 100)
T = rand_floats(n, .25, 10)
R = np.float32(.02)
V = np.float32(.3)

start = time.time()
with places.gpu0:
    r = black_scholes(S, X, T, R, V)
end = time.time()



print 'Computed: ', n , ' Options'
print 'Computation Time: ', (end-start)*1000 ,' Milliseconds'

2- the CPP source generated in the pycache after compiling the above, I need to compile it to a regular binary file (just like nvcc xxx.cu -o my_bin) and then run it using nvidia profiler to investigate some more according to timing obtained from "1" above. How exactly can I do that? this is a crucial thing in my benchmarking report so I need to do it or else I may conclude that it can't be done.

Thanks a lot in advance for all the help and time spent to educate me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant