Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

LLVM compilation ~2-3x slower than it should be #1173

Open
pmenon opened this issue Sep 10, 2020 · 5 comments
Open

LLVM compilation ~2-3x slower than it should be #1173

pmenon opened this issue Sep 10, 2020 · 5 comments
Assignees
Labels
performance Performance related issues or changes.

Comments

@pmenon
Copy link
Member

pmenon commented Sep 10, 2020

Summary:

I was playing around with full JIT execution and noticed some significant compilation times for simple queries. I did a little digging and found at least 2x slowdown in LLVM compilation in NoisePage in comparison to the TPL repo. Below is a comparison of compilation times for all sample TPL files between NoisePage and the TPL repo:

File NoisePage TPL
array.tpl 29.21 12.55
array-iterate.tpl 46.37 28.29
array-iterate-2.tpl 40.58 23.2
call.tpl 29.53 12.5
comments.tpl 30.09 12
compare.tpl 30.07 12.5
deref.tpl 29.84 12.2
fib.tpl 31.33 17
if.tpl 28.93 11.8
if-2.tpl 28.57 11.5
if-3.tpl 28.54 11.5
if-4.tpl 27.68 11.8
loop.tpl 28.96 11.8
loop2.tpl 34.69 16.8
loop3.tpl 28.40 11.9
loop4.tpl 33.34 14.5
nil.tpl 29.44 11.6
offsetof.tpl 28.58 11.9
param.tpl 29.14 11.7
pointer.tpl 28.88 11.5
return-expr.tpl 29.85 12
simple.tpl 28.98 11.9
short-circuit.tpl 30.10 12.9
sql-date.tpl 32.61 14.89
struct.tpl 29.15 12.36
struct-debug.tpl 29.70 11.9
struct-empty.tpl 29.86 12.5
struct-field-use.tpl 28.73 12
struct-nested.tpl 28.96 11.6
struct-nested-2.tpl 30.14 11.9
struct-pointer.tpl 29.55 12.7
test.tpl 36.67 19
while.tpl 29.75 11.9

These are non-SQL files, but I expect similar results for the TPC-H and SSB benchmark queries, if not greater slowdowns.

Probable causes:

  1. Many bytecode implementations are marked VM_HOT to force inlining when they shouldn't be (example). These should be marked VM_COLD and implemented in the CPP file remove header includes.
  2. Some bytecodes are implemented in the header forcing heavy-weight includes that increase compilation times (example. These should also be moved into the CPP file.
  3. We should consider breaking up bytecodes into separate LLVM modules that are included on demand. Not all queries require all bytecodes.
@pmenon pmenon added the performance Performance related issues or changes. label Sep 10, 2020
@gonzalezjo
Copy link
Collaborator

What steps did you take to measure compilation times?

@pmenon
Copy link
Member Author

pmenon commented Sep 21, 2020

@gonzalezjo
Copy link
Collaborator

gonzalezjo commented Sep 23, 2020

Did you just use collect these stats manually with perf? Or do you have some sort of harness to run and time these individually automatically?

@pmenon
Copy link
Member Author

pmenon commented Sep 24, 2020

It's a little manual:

  1. I added timing logic around LLVMEngine::Compile() and log it out using EXECUTION_LOG_INFO().
  2. I modified build-support/run_tpl_tests.py to parse my new log statements giving you compilation times per file. The modified script is here.
  3. Then you can just run make/ninja check-tpl to get timings per file.

@lmwnshn
Copy link
Contributor

lmwnshn commented May 24, 2021

This may or may not have been fixed with #1553. We should revisit bytecodes though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Performance related issues or changes.
Projects
None yet
Development

No branches or pull requests

4 participants