Reduce jitted function overhead #1101

ricardoV94 · 2024-11-22T14:04:13Z

This reduces the overhead on the benchmarked numba function from ~10us to 2.5us on my machine, with trust_input=True

It will hopefully go further down when we remove deprecated function stuff like output_subset and dict returns

📚 Documentation preview 📚: https://pytensor--1101.org.readthedocs.build/en/1101/

Ch0ronomato

Mostly looked at the torch linker and some q's around the tests.

Ch0ronomato · 2024-11-23T03:32:29Z

tests/link/numba/test_basic.py

+    x = pt.vector("x")
+    out = pt.exp(x)
+
+    fn = function([x], out, mode="NUMBA")


Should we also test this on torch? Do you want me to do that?

I think there's much more going on with Torch that we may want to think about more carefully which sort of graph to benchmark and also cpu/gpu. Let's open an issue for when the torch backend is a bit more established?

Ch0ronomato · 2024-11-23T03:33:06Z

tests/link/numba/test_basic.py

+    test_x = np.zeros(1000)
+    assert np.sum(fn(test_x)) == 1000
+
+    benchmark(fn, test_x)


Does benchmark fail if some amount of time is elapsed?

We have a job that should show a warning but I don't think it's working. It at least allows us to do git bisect or something like that over commits to see if performance was dropped

Ch0ronomato · 2024-11-23T03:33:22Z

pytensor/link/pytorch/linker.py

+        # Torch does not accept numpy inputs and may return GPU objects
+        def fn(*inputs, inner_fn=inner_fn):
+            outs = inner_fn(*(pytorch_typify(inp) for inp in inputs))
+            return tuple(out.cpu() for out in outs)


I have a few PR's open that might cobble this fyi.

Whichever gets merged first, we can solve the conflicts after

Ch0ronomato · 2024-11-23T03:33:36Z

pytensor/link/pytorch/linker.py

+        inner_fn = torch.compile(fn)
+
+        # Torch does not accept numpy inputs and may return GPU objects
+        def fn(*inputs, inner_fn=inner_fn):


Maybe just use the closure scoped function?

I think I usually see us just reference inner_fn without declaring an optional param. I think I've seen both but whatever works.

Ah, I think this is a bit faster, but I am not positive

ricardoV94 · 2024-11-23T09:33:05Z

pytensor/link/numba/linker.py

-    def output_filter(self, var: "Variable", out: Any) -> Any:
-        if not isinstance(var, np.ndarray) and isinstance(
-            var.type, pytensor.tensor.TensorType
-        ):
-            return var.type.filter(out, allow_downcast=True)
-
-        return out
-


This was actually wrong, It probably was meant to be if not isinstance(out, np.ndarray). This way it always triggered var.type.fliter which was quite slow.

Anyway, if numba is returning a non-array where we expected an array it means something is wrong in our dispatch, and we should fix it there.

jessegrabowski

lgtm, I left a dumb comment that you can ignore (the gc stuff was re-implemented in the last commit).

jessegrabowski · 2024-11-29T14:59:33Z

pytensor/link/basic.py

@@ -701,34 +696,7 @@ def make_all(self, input_storage=None, output_storage=None, storage_map=None):
            compute_map, nodes, input_storage, output_storage, storage_map
        )

-        computed, last_user = gc_helper(nodes)
-
-        if self.allow_gc:


Is this gc happening somewhere else now? Why was it here if it could just be removed?

It doesn't do anything for jitted functions where you don't control intermediate allocations

ricardoV94 added backend compatibility performance labels Nov 22, 2024

ricardoV94 changed the title ~~Reduce jit fn overhead~~ Reduce jitted function overhead Nov 22, 2024

Ch0ronomato reviewed Nov 23, 2024

View reviewed changes

ricardoV94 force-pushed the reduce_jit_fn_overhead branch from 862f158 to b11db68 Compare November 23, 2024 09:21

ricardoV94 requested a review from aseyboldt November 23, 2024 09:28

ricardoV94 commented Nov 23, 2024

View reviewed changes

ricardoV94 force-pushed the reduce_jit_fn_overhead branch 4 times, most recently from 311f83e to a97734e Compare November 25, 2024 16:17

ricardoV94 added 2 commits November 25, 2024 17:33

Fix JAX test check

3f219e1

Constants are not inputs

15eea0d

ricardoV94 force-pushed the reduce_jit_fn_overhead branch from a97734e to d546c36 Compare November 25, 2024 16:38

ricardoV94 added 2 commits November 25, 2024 18:22

Reduce overhead of JITLinker

c30a41b

Reduce overhead of Function call

4a96d91

ricardoV94 force-pushed the reduce_jit_fn_overhead branch from d546c36 to 4a96d91 Compare November 25, 2024 17:22

ricardoV94 requested a review from Armavica November 26, 2024 10:01

ricardoV94 mentioned this pull request Nov 27, 2024

Speedup sample and allow specifying compile_kwargs (several major changes related to step samplers) pymc-devs/pymc#7578

Merged

ricardoV94 requested a review from jessegrabowski November 27, 2024 10:30

jessegrabowski approved these changes Nov 29, 2024

View reviewed changes

ricardoV94 merged commit 1a3af4b into pymc-devs:main Nov 29, 2024
59 of 60 checks passed

ricardoV94 mentioned this pull request Nov 29, 2024

Provide lower level Numba and Jax functions #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce jitted function overhead #1101

Reduce jitted function overhead #1101

ricardoV94 commented Nov 22, 2024 •

edited

Loading

Ch0ronomato left a comment

Ch0ronomato Nov 23, 2024

ricardoV94 Nov 23, 2024

Ch0ronomato Nov 23, 2024

ricardoV94 Nov 23, 2024

Ch0ronomato Nov 23, 2024

ricardoV94 Nov 23, 2024

Ch0ronomato Nov 23, 2024

ricardoV94 Nov 23, 2024

Ch0ronomato Nov 25, 2024

ricardoV94 Nov 26, 2024

ricardoV94 Nov 23, 2024 •

edited

Loading

jessegrabowski left a comment

jessegrabowski Nov 29, 2024

ricardoV94 Nov 29, 2024

Reduce jitted function overhead #1101

Reduce jitted function overhead #1101

Conversation

ricardoV94 commented Nov 22, 2024 • edited Loading

Ch0ronomato left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

jessegrabowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Nov 22, 2024 •

edited

Loading

ricardoV94 Nov 23, 2024 •

edited

Loading