Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a more sustainable way to manage known syntax mappings #2747

Closed
cyqsimon opened this issue Oct 26, 2023 · 7 comments · Fixed by #2755
Closed

Need a more sustainable way to manage known syntax mappings #2747

cyqsimon opened this issue Oct 26, 2023 · 7 comments · Fixed by #2755

Comments

@cyqsimon
Copy link
Contributor

cyqsimon commented Oct 26, 2023

Currently everything is done in src/syntax_mapping.rs using bespoke code for each syntax mapping we want to add, which is starting to become a problem.

The SyntaxMapping::builtin function is already 160 lines long, and is nowhere near complete. As a daily user of bat, on average I run into this situation several times a week when I'm thinking, "man, this well-known file path really should have syntax mapping built in". But then I remember how it's implemented and think, "I really don't want to make the maintainability any worse than it already is".

What I'm saying is, there has to be a better way to code this to make it more maintainable and extensible (specifically, easier to add more mappings and harder to accidentally add duplicates). So I would like to ask the community for your opinion (how to organise the mappings, what patterns/libraries to use, etc.). I'm happy to help implement the changes if we come to some technical consensus.

@keith-hall
Copy link
Collaborator

keith-hall commented Oct 27, 2023

I imagine some kind of TOML file for example could work, where it would be possible to have a list of glob to target mappings, where environment variables would be replaced and the list deduplicated before being processed sequentially.
I'm not sure how much it would help with preventing from accidentally adding duplicates, unless by some form of convention the globs would generally be ordered like longest prefix first, then lexicographically or something. Or maybe grouping by mapping target could also somehow help?
But the env var replacement and deduplication could help with replacing the current gitconfig logic.

@keith-hall
Copy link
Collaborator

I guess we'd be okay with losing the command line argument support for manipulating mappings if we have them in a separate config file. But presumably we'd want to merge the default mapping with the user one or something...

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Oct 27, 2023

I imagine some kind of TOML file for example could work, where it would be possible to have a list of glob to target mappings, where environment variables would be replaced and the list deduplicated before being processed sequentially. I'm not sure how much it would help with preventing from accidentally adding duplicates, unless by some form of convention the globs would generally be ordered like longest prefix first, then lexicographically or something. Or maybe grouping by mapping target could also somehow help? But the env var replacement and deduplication could help with replacing the current gitconfig logic.

Yeah my mind went to a similar place. I'm thinking, maybe instead of having a single file, we can have a list of files, where each one specifies mapping for a particular application and/or use case? This will help to reduce the possibility of introducing duplicates I think. So for example syntax_mapping_defs/linux/systemd.toml would specify all the various unit file paths.

Then we can use phf_codegen to build this info into the binary at compile time. We can even do fancier things like switch on the target OS to determine which mappings should be included. Deduplication checks can happen here too.

Then finally at runtime we apply your env replacement logic, and all (hopefully) should be nice and fast.

I guess we'd be okay with losing the command line argument support for manipulating mappings if we have them in a separate config file. But presumably we'd want to merge the default mapping with the user one or something...

Yeah I don't think merging the mapping is going to be much of a challenge. No need to lose functionality here.

@cyqsimon
Copy link
Contributor Author

Also, is TOML the most appropriate format choice for this? Not that I have any preference for or against any markup format (well, unless we're considering XML I guess); this is just to have all bases covered.

@keith-hall
Copy link
Collaborator

I guess TOML came to my mind first because it's already used by Cargo, so Rust developers would be familiar with it. I personally wouldn't want JSON because it is too strict with no support for comments or trailing commas, but I could always be overruled here 😉 I'd personally be okay with YAML 1.2 though, but I know it can be hard for newcomers.

Thinking about it may also make sense to continue to support the current config file/command line arg support for a while to give users a chance to migrate without suddenly breaking their workflows.

As a daily user of bat, on average I run into this situation several times a week when I'm thinking, "man, this well-known file path really should have syntax mapping built in". But then I remember how it's implemented and think, "I really don't want to make the maintainability any worse than it already is".

Btw I hope you are keeping track of these so we can get them added once the new solution is in place 😉

@liamdiprose
Copy link

Would https://github.com/tree-sitter/tree-sitter be suitable?

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Dec 6, 2023

Not sure how tree sitter would help. This syntax mapping we are addressing here is not related to content-based syntax detection, rather path-based syntax override.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants