-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of parametrized pulse evaluation #48
Improve performance of parametrized pulse evaluation #48
Conversation
Thanks Will for the detailed investigation. This report is really helpful.
I didn't investigate sympy much because I believed it is slower (symengine is much faster in principle because this is cython based implementation), but 0.6 sec is pretty good number. Originally Qiskit parameter expression was based on Sympy but now it is migrating to symengine for performance reason.
This is indeed surprising result. Probably we can drop symengine support here (if lamdify works such nicely).
Probably we can write the constant as piecewise, i.e.
Yes, I noticed this and wrote about the simplification issue of piecewise in the symengine.py github. However even if I removed simplify (this is not necessary if the function is not based on piecewise), it still very slow to visualize full circuit schedule. |
Reading through all of this I think using symengine for this application won't work yet. I feel like if we can work with symengine to fix the issues ideally it would be a faster solution (we saw a 7-10x speedup in terra for Yeah, for the
In general for qiskit we try to use symengine for performance, but we can't use it everywhere because symengine doesn't have full platform support for all the platforms that qiskit supports (mainly 32bit platforms) and building it from source is next to impossible for most users. So in general for qiskit we use symengine as the default when we can, but support a fall back to sympy if symengine isn't available. The one thing sympy is not allowed for in qiskit though is a module level import, that is because importing sympy is significantly slower to import than everything else in qiskit. So for this I think relying on just sympy is fine here if we have a performant solution using it. |
This works around an issue with sympy where its lambda returns a scalar instead of an array when the expression evaluates to a scalar: sympy/sympy#5642
6eb57c4
to
6f2496a
Compare
Ah, okay. I thought you were looking at sympy because you had put the list comprehension in and the only benefit I saw to the list comprehension over direct evaluation of the numpy array was that it handled the case of the sympy lambda function returning a scalar.
I tried this out and it worked for the pulses in the notebook. I pushed another commit to the branch in this PR with the change.
When I remove simplify, Gaussian also stops working. I found that this is because
Do you know which kind of operations were sped up in the ParameterExpression benchmarks? I would have guessed simplification could be sped up a lot but symengine falls back to sympy like you point out. I guess it is numerically evaluating an expression that gets sped up? I think the sympy lambdify prints the expression as a string and then calls exec on it with locals (like So I think the current status is that:
lambda_func = sym.lambdify(t, [assigned_expr]) to import sympy
lambda_func = sympy.lambdify(t, sym.sympify(assigned_expr)) as a workaround until the issues are fixed in symengine. This would let everything else related to parametric pulses keep using symengine and just change the lambda function creation which is only used internally within the ParametricPulse class. |
One other caveat -- I have only tried the performance notebook that @nkanazawa1989 committed in this branch. We should check further for edge cases. Hopefully, the qiskit-terra unit tests are sufficient? |
Qiskit terra unittest is bit outdated and probably we need to review all the tests there. I don't think it does test for weird pulses. I feel just dropping symengine for pulse is good approach for now rather than converting it into sympy expression because it's in the requirement. Unfortunately the caching approach won't work because pulse.play(Gaussian(160, 0.1, 40), channel_d0)
pulse.play(Gaussian(160, 0.1, 40), channel_d0) will give you different instance. I'm not sure if we can turn |
Does pulse.play(Gaussian(160, 0.1, 40), channel_d0)
pulse.play(Gaussian(160, 0.1, 40), channel_d0) occur often in practice? I was thinking you would have an InstructionScheduleMap or Target with the schedule instances that would get reused as a circuit was processed. For reference, here is a little bit of profiling that I tried:
Here "sched" is calling To address the cost of the lambdify call, I added a new (messy) commit where I lambdify the symbolic expression on the first call to
I don't have a strong feeling about this. It seems like handling symengine does not add much overhead. These symbolic pulse shapes are new so they are not connected to the rest of qiskit, but there might be some convenience to using the same symbolic format as other parameter expressions.
I did see 20 tests fail with my next to last commit. I did not try to dig into them to see why they failed. Some seemed to be just a deprecation warning? Some seemed to be a pickling issue with symengine. A couple seemed to involve pulse values not matching what was expected. The failures might give extra motivation for avoiding symengine for now. Here is the exact code I used: from subprocess import run
from time import perf_counter
from qiskit import circuit, pulse, transpile, schedule, quantum_info as qi
from qiskit.test.mock import FakeBogota
from qiskit.utils import optionals
optionals.HAS_SYMENGINE = False
backend = FakeBogota()
circ = circuit.QuantumCircuit(2)
su4 = qi.random_unitary(4, seed=123).to_instruction()
circ.compose(su4, [0, 1], inplace=True)
circ.measure_all()
REPS = 4
schedule(transpile(circ, backend), backend)
sched = schedule(transpile(circ, backend), backend)
# Warm up
sched.draw()
pulse.Constant(800, 0.1)
pulse.Gaussian(160, 0.1, 40)
results = [
(
"git",
run(
"git describe --tags --exact-match || git symbolic-ref -q --short HEAD || git rev-parse --short HEAD",
capture_output=True,
text=True,
check=True,
shell=True,
).stdout.strip(),
),
("symengine", optionals.HAS_SYMENGINE)
]
start = perf_counter()
for _ in range(REPS):
sched.draw()
results.append(("sched", (perf_counter() - start)/REPS))
start = perf_counter()
for _ in range(REPS):
gaussian = pulse.Gaussian(160, 0.1, 40)
gaussian.get_waveform()
results.append(("gaussian", (perf_counter() - start)/REPS))
start = perf_counter()
for _ in range(REPS):
gaussian_sq = pulse.GaussianSquare(800, 0.1, 64, risefall_sigma_ratio=2)
gaussian_sq.get_waveform()
results.append(("gaussian_sq", (perf_counter() - start)/REPS))
start = perf_counter()
for _ in range(REPS):
drag = pulse.Drag(160, 0.1, 40, 0.3)
drag.get_waveform()
results.append(("drag", (perf_counter() - start)/REPS))
start = perf_counter()
for _ in range(REPS):
constant = pulse.Constant(800, 0.1)
constant.get_waveform()
results.append(("constant", (perf_counter() - start)/REPS))
print("| " + " | ".join(r[0] for r in results) + " |")
print("| " + " | ".join(f"{r[1]:0.3g}" if isinstance(r[1], float) else str(r[1]) for r in results) + " |") |
Thanks @wshanks this is really cool professional job. I was really impressed by the numbers in the table. I didn't realize we can use lamdify with more than one arguments (and only first one is array). Seems like adding lamdified function to class member as you suggest is reasonable approach. This will overcome the independent instance issue I raised above. Another option would be converting |
2af235d
into
nkanazawa1989:upgrade/serializable-parametric-pulse
I played around with some different things on the
upgrade/serializable-parametric-pulse
branch. I thought a PR might be a good format for discussing what I saw. Here are some findings:simplify()
reduced the time needed to draw the schedule for the whole circuit in the performance check notebook from too long for me to wait (I think you said 10 minutes) to 2.7 seconds.lambdify
. This might be something to look into further (ask thesymengine
developers?). I also find it odd that the signature oflambdify
is different forsympy
andsymengine
(expression vs. iterable of expressions).symengine
did not have the issue oflambdify
returning a function that returns a scalar when there is not
dependence.Gaussian
andGaussianSquare
,symengine
was more than 10 time slower thansympy
for the notebook cell tests.symengine
is probably slower becuase I left in thesimplify()
calls for it. I found that if I did not usesimplify()
that I gotRuntimeError: Not Implemented
forGaussian
,GaussianSquare
, andConstant
. This is the same error that I see forDrag
(it's from a Cython file, so a bit tricky to debug). Perhaps there is something complex in the expressions that gets simplified to real or maybe it is something else.I have not looked into qiskit's stance on sympy and symengine. Is it necessary to support both independently? Is casting symengine to sympy an option? I think symengine has a function for this but I don't know if qiskit needs to support having either one installed without the other. I think the symengine developers are pretty responsive if we have specific issues (but I don't know how much work these issues could require).