Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve jinja2 templating experience #3199

Open
Marigold opened this issue Aug 26, 2024 · 6 comments
Open

Improve jinja2 templating experience #3199

Marigold opened this issue Aug 26, 2024 · 6 comments

Comments

@Marigold
Copy link
Collaborator

Marigold commented Aug 26, 2024

Problem

We use the Jinja2 templating engine in our metadata YAML files, especially in cases with dimensions, to avoid repeating the same phrases over and over. The problem is that Jinja2 syntax can feel unnatural and verbose. It's easy to make syntax errors because we don't have a VSCode highlighter, and typos are hard to spot because Jinja2 often falls back to an empty string.

This is made even more complex because we mix dynamic_yaml we jinja and use non-standard tags <% ... %> instead of {% ... %}.

Possible solutions

Replace dynamic-yaml by jinja2

dynamic-yaml is useful, yet outdated way of using substitution that is not natively supported by YAML. Perhaps we could fully replace it by jinja2 and start using default {% ... %} tags and VSCode plugin for syntax highlighting?

UPDATE: I tried Better Jinja VSCode plugin and it's not as helpful. It doesn't recognize tags in nested YAML fields and doesn't show syntax errors.

Double down on current jinja2 + dynamic-yaml implementation

Our jinja2 code can be improved and made less verbose and more resilient. See a couple of examples in this PR.

Use custom python functions

Instead of creating metadata strings in jinja2 templating engine, define functions in pure python and then pass those functions to jinja2 environment. The question is where to store those functions. It'd have to be in a grapher step, since this is where we apply the formatting. Another disadvantage is that these functions wouldn't be in the metadata file.

cc @lucasrodes

Copy link

stale bot commented Nov 5, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Nov 5, 2024
@Marigold
Copy link
Collaborator Author

Marigold commented Nov 5, 2024

Keep it open, I might tackle it in this cooldown.

@Marigold
Copy link
Collaborator Author

Marigold commented Dec 2, 2024

Adding ideas from #3657

Note that this is still not perfect and doesn't solve \n. problem from the issue (we still need to use <%- elif to get rid of it). Ideally, we should never have to use - and have the Jinja templates as intuitive as possible. To do this properly, we should:

  1. Replace dynamic-yaml by something simpler and unify saving & loading of metadata files (while keeping it fast, there were tons of performance optimizations)
  2. Move jinja functionality to owid-catalog and add method VariableMeta.render_jinja(dim_dict={"..."})
  3. Add validation for double whitespaces, newlines, \n., etc.

@larsyencken
Copy link
Collaborator

We discussed this again in triage -- it's mainly about refactoring the rendering flow for dynamic metadata to make it easier to work with and more standard. However, there are a range of different ways this is currently being done in the team. Without having one clear standard, it's hard to settle this right now.

@lucasrodes
Copy link
Member

lucasrodes commented Dec 5, 2024

However, there are a range of different ways this is currently being done in the team.

@larsyencken, could you briefly summarize what the main differences within the team are (high-level)? My impression is that there is an overall trend in the team to use Jinja and work with long-formatted tables. And, fewer of us prefer to explicitly work with wide-formatted tables and programmatically set the metadata. So there are mainly two big "standards" at the moment, I'd say.

Just so I have an intuition on what the differences are here. Thanks!

@Marigold
Copy link
Collaborator Author

Marigold commented Dec 6, 2024

@lucasrodes that's about right. Sharing Pablo's workflow below to have everything in one place.

I haven't figured out a nice generic method for this. I usually use etl.helpers.print_tables_metadata_template, often tweaking it a bit. It's not a particularly sophisticated method... But once the yaml exists, I have total freedom to tweak things individually.

I'm working on a way that would let you use both in YAML filem, i.e. using Jinja templating to create rough metadata for all variables and then tweak specific cases if needed by using their full paths with dimensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants