Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro parsing #1057

Merged
merged 8 commits into from
Oct 19, 2018
Merged

Macro parsing #1057

merged 8 commits into from
Oct 19, 2018

Conversation

cmcarthur
Copy link
Member

work in progress. this is promising -- it meaningfully speeds up compilation on a couple of my test projects. @drewbanin i'd love for you to give this a try on some of the slow-to-compile projects that we know of and see the impact.

@cmcarthur cmcarthur requested a review from beckjake October 17, 2018 14:47
@drewbanin
Copy link
Contributor

This works really well @cmcarthur! What else do we need to do to get this merged? Is it touch-up stuff, or is it missing something more fundamental?

@beckjake
Copy link
Contributor

beckjake commented Oct 18, 2018

I think this is all set, now that I handled the two identical filepaths in one project issue

@beckjake beckjake requested review from drewbanin and removed request for beckjake October 18, 2018 19:31

def get_node_template(self, node):
# multiple nodes can be in a file, but they must have unique names!
key = (node['original_file_path'], node['name'])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key should be unique-per-file, instead of unique-per-node, no?

Copy link
Contributor

@beckjake beckjake Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confident unique-per-node is correct!

  • models are a single node per file and their raw_sql is the full file, so it's exactly the same
  • macros/operations have multiple nodes per file, and the raw_sql is the block contents their generated templates only have the named macro, and each block has a (project-unique, which is vastly more than we care about) unique name

One thing we might need to do is add in the root path/project name to the key. I hadn't considered dependent projects that happen to have an identical layout until just now. It seems that at least in some tests, original_file_path is actually the just the path under macros/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you were totally correct. It's all about dependency packages, I misunderstood the issue because I missed the difference between do_something and do_something2!

@drewbanin
Copy link
Contributor

drewbanin commented Oct 19, 2018

I ran dbt a bunch with some sample dbt projects. For larger, realistic projects like our internal-analytics repo, the majority of the time is spent on running parse-time queries (like already_exists()), so it mutes the benefit of this PR.

For smaller sample projects, the difference is small, but significant. After some non-rigorous timing, I found:

branch dbt compile time
dev/guion-bluford 3.1s
macro-parsing 2.3s

So, this represents a 25% speedup, even if it's generally less than one second. Also worth restating: this effect is less apparent if a project has even a single adapter.already_exists() call that runs at parse time.

@beckjake
Copy link
Contributor

Yeah, that's about what I'd expect when to comes to the already_exists check.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough insight into the jinja internals to have any input on the code here. I did however run this on a bunch of projects, and it seemed to work just fine. LGTM!

@beckjake beckjake merged commit 31bd22f into dev/guion-bluford Oct 19, 2018
@beckjake beckjake deleted the macro-parsing branch October 19, 2018 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants