Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More extensible syntax mapping mechanism #2755

Merged
merged 49 commits into from
Jan 21, 2024

Conversation

cyqsimon
Copy link
Contributor

@cyqsimon cyqsimon commented Oct 31, 2023

Will close #2747.

Checklist

  • Ratify specification
  • Implementation
  • Add tests

@cyqsimon
Copy link
Contributor Author

@keith-hall would you please take a look at README.md and see if there are any obvious problems with this architecture? Thanks.

@keith-hall
Copy link
Collaborator

I don't see any problems with this approach 👍

@cyqsimon cyqsimon mentioned this pull request Oct 31, 2023
@cyqsimon
Copy link
Contributor Author

I'll leave it here for a week or two for people to take a look. If there are no major objections I will go ahead with the implementation.

@Enselic
Copy link
Collaborator

Enselic commented Nov 2, 2023

I like the idea in general. My main concern is startup time, but in the linked issues you are talking about doing clever stuff at compile time, which has the potential to even improve startup time.

I don't think we can at this point "we promise to include your solution in bat", but I am very interested in looking at a prototype and evaluating that, then we can decide on the next steps.

I hope am not sounding discouraging! Would be great if we could solve syntax mapping more elegantly.

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

I was investigating the possibility of doing this with proc macros, because I suspected that would make the code somewhat more elegant. In conclusion I decided that using a build script is probably still the way to go (at least for the moment). This comment is just to document my research; I will add more info if they are worthy of documenting.

  1. I initially thought I would be using phf_codegen, but I soon realised that a hashmap would be poorly suited for this job because glob patterns are not appropriate keys. Therefore probably the best way forward is to generate a static array, something like this: static STATIC_RULES: [(&'static str, &'static str); N].
  2. Seems like someone has tried to do similar (unsurprisingly) and has written an article about it. It is a bit old (2020-09) but most things in it are still valid.
  3. I tried to look for a crate that does similar to phf_codegen but for arrays. Unfortunately I was not able to find any, but honestly I suspect it's because it's just too simple.
  4. Proc macros are (still) not appropriate for this in 2023, because it lacks the build script's capability to watch/track a specific file/directory. See Tracking Issue for proc_macro::{tracked_env, tracked_path} rust-lang/rust#99515.

@cyqsimon cyqsimon force-pushed the syntax-mapping-refactor branch from 5c863c1 to f8349a8 Compare November 2, 2023 08:23
@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

Rebased onto #2756.

@cyqsimon cyqsimon force-pushed the syntax-mapping-refactor branch 2 times, most recently from 34567fe to e856d7f Compare November 2, 2023 08:29
@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

Okay, I have an initial prototype of the codegen part working. Right now it makes something like this:

// I formatted it manually to make it easier for the eye
static STATIC_RULES: [(&str, MappingTarget); 6] = [
    (
        r#"${XDG_CONFIG_HOME}/git/config"#,
        MappingTarget::MapTo(r#"Git Config"#),
    ),
    (
        r#"${HOME}/.config/git/config"#,
        MappingTarget::MapTo(r#"Git Config"#),
    ),
    (
        r#"${XDG_CONFIG_HOME}/git/ignore"#,
        MappingTarget::MapTo(r#"Git Ignore"#),
    ),
    (
        r#"${HOME}/.config/git/ignore"#,
        MappingTarget::MapTo(r#"Git Ignore"#),
    ),
    (
        r#"${XDG_CONFIG_HOME}/git/attributes"#,
        MappingTarget::MapTo(r#"Git Attributes"#),
    ),
    (
        r#"${HOME}/.config/git/attributes"#,
        MappingTarget::MapTo(r#"Git Attributes"#),
    ),
];

I'm thinking, I can actually take this one step further by moving the string searches to compile time. So instead of storing a string as the matcher, we store something like this:

enum CompileTimeMatcher {
    Static(&'static str),
    Dynamic(Vec<MatcherSegment>),
}

// ... where
enum MatcherSegment {
    Text(&'static str),
    Env(&'static str),
}

I assume this is faster, but the benefit might be minimal.

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

Okay, I think I have a decent implementation of the aforementioned idea. This is what the generated code currently looks like:

// formatted
static STATIC_RULES: [(BuiltinMatcher, MappingTarget); 7] = [
    (
        BuiltinMatcher::Dynamic(Lazy::new(|| {
            join_segments(&[
                MatcherSegment::Env(r#"XDG_CONFIG_HOME"#),
                MatcherSegment::Text(r#"/git/config"#),
            ])
        })),
        MappingTarget::MapTo(r#"Git Config"#),
    ),
    (
        BuiltinMatcher::Dynamic(Lazy::new(|| {
            join_segments(&[
                MatcherSegment::Env(r#"HOME"#),
                MatcherSegment::Text(r#"/.config/git/config"#),
            ])
        })),
        MappingTarget::MapTo(r#"Git Config"#),
    ),
    (
        BuiltinMatcher::Dynamic(Lazy::new(|| {
            join_segments(&[
                MatcherSegment::Env(r#"XDG_CONFIG_HOME"#),
                MatcherSegment::Text(r#"/git/ignore"#),
            ])
        })),
        MappingTarget::MapTo(r#"Git Ignore"#),
    ),
    (
        BuiltinMatcher::Dynamic(Lazy::new(|| {
            join_segments(&[
                MatcherSegment::Env(r#"HOME"#),
                MatcherSegment::Text(r#"/.config/git/ignore"#),
            ])
        })),
        MappingTarget::MapTo(r#"Git Ignore"#),
    ),
    (
        BuiltinMatcher::Dynamic(Lazy::new(|| {
            join_segments(&[
                MatcherSegment::Env(r#"XDG_CONFIG_HOME"#),
                MatcherSegment::Text(r#"/git/attributes"#),
            ])
        })),
        MappingTarget::MapTo(r#"Git Attributes"#),
    ),
    (
        BuiltinMatcher::Dynamic(Lazy::new(|| {
            join_segments(&[
                MatcherSegment::Env(r#"HOME"#),
                MatcherSegment::Text(r#"/.config/git/attributes"#),
            ])
        })),
        MappingTarget::MapTo(r#"Git Attributes"#),
    ),
    (
        BuiltinMatcher::Fixed(r#"*.conf"#),
        MappingTarget::MapExtensionToUnknown,
    ),
];

The downside is that this uses a couple of new types and functions, which doesn't feel quite right wherever I put them:

  • They are only used by the generated code, so I put them in src/syntax_mapping.rs, future maintainers will probably find it odd/confusing.
  • They are static code blocks, so they don't really belong in the generated code.

Ended up going with option 1. I guess it's just a minor nuisance at the end of the day. It's worth it if there are tangible performance improvements.

@cyqsimon cyqsimon closed this Nov 2, 2023
@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

So I fat-fingered tab in the middle of typing, right before I hit space... And that closed the PR 🤦. What a UI design disaster.

@cyqsimon cyqsimon reopened this Nov 2, 2023
@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

I previously said I'll leave it for a week or two, but I had time today so I thought I'd just do it.

Maintainers, please don't hesitate to ask for big changes if you think they are warranted.

@cyqsimon cyqsimon force-pushed the syntax-mapping-refactor branch from 3f0a772 to cbe755f Compare November 2, 2023 17:44
@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 2, 2023

Rebased onto master.

@Enselic
Copy link
Collaborator

Enselic commented Nov 4, 2023

I skimmed over the code and it looks like you know what you are doing. Nothing jumped out at me as a red flag. I didn't look very close at the code though.

I don't want to risk wasting your time, but how much work would it be to port all of the existing syntax mappings to the new system? So that we can see how it impacts performance. Or maybe you can partly port it? So that half of syntax mappings use new and half use old. Maybe that is enough to git a sense for how this impacts performance.

To check performance, do this (approximate instructions):

cargo install --locked hyperfine
cargo build --release
tests/benchmarks/run-benchmarks.sh --release

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 4, 2023

Thanks for the compliment.

I'm not ready to start porting just yet - currently the codegen is working, but the include!-ed mappings are not yet being used. I need to add that bit of code first; shouldn't be too difficult though.

As of the porting, there's not actually that much to port. Maybe 20 files (which can be done in like 30 minutes). I'll run the benchmark after it's all done.

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 4, 2023

Impl is all done. Tests are failing because mappings haven't been ported. Will do said porting tomorrow.

@cyqsimon cyqsimon force-pushed the syntax-mapping-refactor branch from 7127f0e to fc8eb15 Compare November 4, 2023 18:49
src/syntax_mapping.rs Outdated Show resolved Hide resolved
@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 5, 2023

I ran the benchmark like so (using patch in #2768):

cd "$(git rev-parse --show-toplevel)/tests/benchmarks"
rm -rf results
mkdir -p results/{pre,post}

git checkout master
cargo build -r
RUN_COUNT=50 ./run-benchmarks.sh --release
cp benchmark-results/report.md results/pre/

git checkout syntax-mapping-refactor
cargo build -r
RUN_COUNT=50 ./run-benchmarks.sh --release
cp benchmark-results/report.md results/post/

bat results/*/*.md

Current master

Command Mean [ms] Min [ms] Max [ms] Relative
bat 3.9 ± 0.9 2.4 7.8 1.00
bat … small-CpuInfo-file.cpuinfo 20.4 ± 1.7 18.6 28.4 1.00
bat … small-Markdown-file.md 25.4 ± 1.8 23.5 34.1 1.00
bat … --language=txt numpy_test_multiarray.py 5.4 ± 2.4 3.7 43.3 1.00
bat … grep-output-ansi-sequences.txt 38.6 ± 1.8 36.3 44.0 1.00
bat … jquery.js 520.4 ± 5.8 514.5 533.6 1.00
bat … miniz.c 50.1 ± 1.7 48.1 56.0 1.00
bat … numpy_test_multiarray.py 688.9 ± 5.0 683.3 697.1 1.00
bat … grep-output-ansi-sequences.txt 36.1 ± 2.3 33.3 45.2 1.00
bat … jquery.js 515.6 ± 5.0 511.1 527.1 1.00
bat … miniz.c 49.6 ± 2.3 47.6 59.6 1.00
bat … numpy_test_multiarray.py 688.0 ± 6.1 680.7 697.3 1.00
bat … --language=txt *.txt 4.6 ± 1.0 3.3 8.8 1.00

syntax-mapping-refactor

Command Mean [ms] Min [ms] Max [ms] Relative
bat 2.2 ± 0.7 1.1 5.6 1.00
bat … small-CpuInfo-file.cpuinfo 20.3 ± 1.6 18.6 28.2 1.00
bat … small-Markdown-file.md 25.9 ± 1.6 24.3 31.7 1.00
bat … --language=txt numpy_test_multiarray.py 3.6 ± 1.1 2.2 8.3 1.00
bat … grep-output-ansi-sequences.txt 38.3 ± 1.4 36.6 45.6 1.00
bat … jquery.js 522.3 ± 4.7 517.0 532.5 1.00
bat … miniz.c 48.9 ± 2.5 46.4 59.8 1.00
bat … numpy_test_multiarray.py 690.9 ± 4.2 685.4 700.7 1.00
bat … grep-output-ansi-sequences.txt 36.4 ± 1.8 34.1 42.3 1.00
bat … jquery.js 517.1 ± 5.6 513.0 532.4 1.00
bat … miniz.c 48.8 ± 1.8 46.1 53.6 1.00
bat … numpy_test_multiarray.py 689.0 ± 3.7 683.8 694.9 1.00
bat … --language=txt *.txt 3.1 ± 1.1 1.7 7.8 1.00

Unsurprisingly, the simpler the task, the bigger the speedup. For heavier tasks there is barely a difference.

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 5, 2023

One thing I was somewhat concerned about was a possible increase of the binary size due to the pervasive use of once_cell::sync::Lazy. Fortunately it seems like the increase is not too alarming.

cd "$(git rev-parse --show-toplevel)"
rm -rf res
mkdir -p res/{pre,post}

git checkout master
cargo build -r
cp target/release/bat res/pre/

git checkout syntax-mapping-refactor
cargo build -r
cp target/release/bat res/post/

du --summarize --bytes res/*/bat
5379928	res/post/bat
5359128	res/pre/bat

Eh. 20KB difference seems fine.

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 5, 2023

This is what the codegen looks like right now (Linux). I think this is pretty much what we will end up with:

/// Generated by build script from /src/syntax_mapping/builtins/.
pub(crate) static BUILTIN_MAPPINGS: [(Lazy<Option<GlobMatcher>>, MappingTarget); 56] = [
(Lazy::new(|| Some(build_matcher_fixed(r#"httpd.conf"#))), MappingTarget::MapTo(r#"Apache Conf"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/apache2/**/*.conf"#))), MappingTarget::MapTo(r#"Apache Conf"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/apache2/sites-*/**/*"#))), MappingTarget::MapTo(r#"Apache Conf"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/bat/config"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"Containerfile"#))), MappingTarget::MapTo(r#"Dockerfile"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.h"#))), MappingTarget::MapTo(r#"C++"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#".clang-format"#))), MappingTarget::MapTo(r#"YAML"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.fs"#))), MappingTarget::MapTo(r#"F#"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"fish_history"#))), MappingTarget::MapTo(r#"YAML"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"XDG_CONFIG_HOME"#), MatcherSegment::Text(r#"/git/config"#)])), MappingTarget::MapTo(r#"Git Config"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"HOME"#), MatcherSegment::Text(r#"/.config/git/config"#)])), MappingTarget::MapTo(r#"Git Config"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"XDG_CONFIG_HOME"#), MatcherSegment::Text(r#"/git/ignore"#)])), MappingTarget::MapTo(r#"Git Ignore"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"HOME"#), MatcherSegment::Text(r#"/.config/git/ignore"#)])), MappingTarget::MapTo(r#"Git Ignore"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"XDG_CONFIG_HOME"#), MatcherSegment::Text(r#"/git/attributes"#)])), MappingTarget::MapTo(r#"Git Attributes"#)),
(Lazy::new(|| build_matcher_dynamic(&[MatcherSegment::Env(r#"HOME"#), MatcherSegment::Text(r#"/.config/git/attributes"#)])), MappingTarget::MapTo(r#"Git Attributes"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.jsonl"#))), MappingTarget::MapTo(r#"JSON"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.ksh"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/var/spool/mail/*"#))), MappingTarget::MapTo(r#"Email"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/var/mail/*"#))), MappingTarget::MapTo(r#"Email"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"nginx.conf"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"mime.types"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/nginx/**/*.conf"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/nginx/sites-*/**/*"#))), MappingTarget::MapTo(r#"nginx"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.nse"#))), MappingTarget::MapTo(r#"Lua"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/os-release"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/usr/lib/os-release"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/initrd-release"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/usr/lib/extension-release.d/extension-release.*"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/usr/share/libalpm/hooks/*.hook"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/pacman.d/hooks/*.hook"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.pac"#))), MappingTarget::MapTo(r#"JavaScript (Babel)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.ron"#))), MappingTarget::MapTo(r#"Rust"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.sarif"#))), MappingTarget::MapTo(r#"JSON"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"/etc/profile"#))), MappingTarget::MapTo(r#"Bourne Again Shell (bash)"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/.ssh/config"#))), MappingTarget::MapTo(r#"SSH Config"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/systemd/**/*.conf"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"**/systemd/**/*.example"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.automount"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.device"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.dnssd"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.link"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.mount"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.netdev"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.network"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.nspawn"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.path"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.service"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.scope"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.slice"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.socket"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.swap"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.target"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.timer"#))), MappingTarget::MapTo(r#"INI"#)),
(Lazy::new(|| Some(build_matcher_fixed(r#"*.conf"#))), MappingTarget::MapExtensionToUnknown),
(Lazy::new(|| Some(build_matcher_fixed(r#"build"#))), MappingTarget::MapToUnknown),
(Lazy::new(|| Some(build_matcher_fixed(r#"rails"#))), MappingTarget::MapToUnknown)
];

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Nov 5, 2023

Maintainers, can you please take a look at my porting? I've classified the rules according to their applicable platforms; please help check if all mappings are in their correct subdirs. Thanks.

@cyqsimon cyqsimon force-pushed the syntax-mapping-refactor branch from d90cdb2 to 04ce25c Compare November 5, 2023 15:32
@cyqsimon
Copy link
Contributor Author

Here's the benchmark prototype, but I am unable to figure out how to make the results predictable/reliable.

The problem is that the results swing about very erratically when run consecutively, to the point where any of the three tests could be the fastest or the slowest. I suspect there's something to do with CPU caching. Maybe hyperfine has some way of mitigating the influence of this, but I doubt it's going to be reliable.

Can you please try running these benchmarks several times to see whether this problem is reproducible? Thanks all.

Cargo.lock Outdated Show resolved Hide resolved
@cyqsimon cyqsimon force-pushed the syntax-mapping-refactor branch from 61b4e05 to 15ab447 Compare December 22, 2023 11:55
@cyqsimon
Copy link
Contributor Author

Force pushed to trigger CI rerun.

@cyqsimon
Copy link
Contributor Author

Okay something weird is going on with the CI. I don't think this is my fault?

@Enselic
Copy link
Collaborator

Enselic commented Dec 25, 2023

I think the CI failure will be fixed by #2811.

Cargo.toml Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this excellent contribution. This was a pleasure to review.

I would be in favor of this PR, even if it would result in a (small) performance penalty. The fact that we can even save time in some cases makes it even better.

In the future, if we want to speed up the code path where we need to evaluate the syntax mappings, we could think about parallelization. This is not completely straightforward, but might work.

The naive thing would be to match against all possible patterns in parallel. And then do a simple loop afterwards, returning the first result that yielded a match. But this does not make use of the short-circuiting behavior.

A more advanced approach might even make use of that (the short circuiting) and evaluate batches of patterns at once (in parallel), but return early once a match is found. Worst case, this would evaluate num_threads-1 patterns that would not have been needed.

But I think this should be left for another day 😄

@cyqsimon cyqsimon requested a review from sharkdp January 21, 2024 16:26
@sharkdp
Copy link
Owner

sharkdp commented Jan 21, 2024

Thank you for the update

@sharkdp sharkdp merged commit db66e44 into sharkdp:master Jan 21, 2024
22 checks passed
@cyqsimon
Copy link
Contributor Author

A more advanced approach might even make use of that (the short circuiting) and evaluate batches of patterns at once (in parallel), but return early once a match is found. Worst case, this would evaluate num_threads-1 patterns that would not have been needed.

I was thinking about this proposal, and I actually don't think it's as intimidating as it sounds. All this requires is a custom iterator adapter to manage the thread pool and the message passing; the caller would require very few changes.

I'll try to write a prototype today or in the next few days.

@cyqsimon cyqsimon deleted the syntax-mapping-refactor branch January 22, 2024 06:45
@sharkdp
Copy link
Owner

sharkdp commented Jan 22, 2024

Maybe the thread spawning overhead is actually so high that it's not worth it. Starting a thread pool typically also takes a few milliseconds.

tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 8, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [sharkdp/bat](https://github.com/sharkdp/bat) | minor | `v0.24.0` -> `v0.25.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>sharkdp/bat (sharkdp/bat)</summary>

### [`v0.25.0`](https://github.com/sharkdp/bat/blob/HEAD/CHANGELOG.md#v0250)

[Compare Source](sharkdp/bat@v0.24.0...v0.25.0)

#### Features

-   Set terminal title to file names when Paging is not Paging::Never [#&#8203;2807](sharkdp/bat#2807) ([@&#8203;Oliver-Looney](https://github.com/Oliver-Looney))
-   `bat --squeeze-blank`/`bat -s` will now squeeze consecutive empty lines, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   `bat --squeeze-limit` to set the maximum number of empty consecutive when using `--squeeze-blank`, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   `PrettyPrinter::squeeze_empty_lines` to support line squeezing for bat as a library, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Syntax highlighting for JavaScript files that start with `#!/usr/bin/env bun` [#&#8203;2913](sharkdp/bat#2913) ([@&#8203;sharunkumar](https://github.com/sharunkumar))
-   `bat --strip-ansi={never,always,auto}` to remove ANSI escape sequences from bat's input, see [#&#8203;2999](sharkdp/bat#2999) ([@&#8203;eth-p](https://github.com/eth-p))
-   Add or remove individual style components without replacing all styles [#&#8203;2929](sharkdp/bat#2929) ([@&#8203;eth-p](https://github.com/eth-p))
-   Automatically choose theme based on the terminal's color scheme, see [#&#8203;2896](sharkdp/bat#2896) ([@&#8203;bash](https://github.com/bash))
-   Add option `--binary=as-text` for printing binary content, see issue [#&#8203;2974](sharkdp/bat#2974) and MR [#&#8203;2976](sharkdp/bat#2976) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Make shell completions available via `--completion <shell>`, see issue [#&#8203;2057](sharkdp/bat#2057) and MR [#&#8203;3126](sharkdp/bat#3126) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Syntax highlighting for puppet code blocks within Markdown files, see [#&#8203;3152](sharkdp/bat#3152) ([@&#8203;liliwilson](https://github.com/liliwilson))

#### Bugfixes

-   Fix long file name wrapping in header, see [#&#8203;2835](sharkdp/bat#2835) ([@&#8203;FilipRazek](https://github.com/FilipRazek))
-   Fix `NO_COLOR` support, see [#&#8203;2767](sharkdp/bat#2767) ([@&#8203;acuteenvy](https://github.com/acuteenvy))
-   Fix handling of inputs with OSC ANSI escape sequences, see [#&#8203;2541](sharkdp/bat#2541) and [#&#8203;2544](sharkdp/bat#2544) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix handling of inputs with combined ANSI color and attribute sequences, see [#&#8203;2185](sharkdp/bat#2185) and [#&#8203;2856](sharkdp/bat#2856) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix panel width when line 10000 wraps, see [#&#8203;2854](sharkdp/bat#2854) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix compile issue of `time` dependency caused by standard library regression [#&#8203;3045](sharkdp/bat#3045) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Fix override behavior of --plain and --paging, see issue [#&#8203;2731](sharkdp/bat#2731) and MR [#&#8203;3108](sharkdp/bat#3108) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Fix bugs in `$LESSOPEN` support, see [#&#8203;2805](sharkdp/bat#2805) ([@&#8203;Anomalocaridid](https://github.com/Anomalocaridid))

#### Other

-   Upgrade to Rust 2021 edition [#&#8203;2748](sharkdp/bat#2748) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Refactor and cleanup build script [#&#8203;2756](sharkdp/bat#2756) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Checks changelog has been written to for MRs in CI [#&#8203;2766](sharkdp/bat#2766) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
    -   Use GitHub API to get correct MR submitter [#&#8203;2791](sharkdp/bat#2791) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Minor benchmark script improvements [#&#8203;2768](sharkdp/bat#2768) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Update Arch Linux package URL in README files [#&#8203;2779](sharkdp/bat#2779) ([@&#8203;brunobell](https://github.com/brunobell))
-   Update and improve `zsh` completion, see [#&#8203;2772](sharkdp/bat#2772) ([@&#8203;okapia](https://github.com/okapia))
-   More extensible syntax mapping mechanism [#&#8203;2755](sharkdp/bat#2755) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Use proper Architecture for Debian packages built for musl, see [#&#8203;2811](sharkdp/bat#2811) ([@&#8203;Enselic](https://github.com/Enselic))
-   Pull in fix for unsafe-libyaml security advisory, see [#&#8203;2812](sharkdp/bat#2812) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Update git-version dependency to use Syn v2, see [#&#8203;2816](sharkdp/bat#2816) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Update git2 dependency to v0.18.2, see [#&#8203;2852](sharkdp/bat#2852) ([@&#8203;eth-p](https://github.com/eth-p))
-   Improve performance when color output disabled, see [#&#8203;2397](sharkdp/bat#2397) and [#&#8203;2857](sharkdp/bat#2857) ([@&#8203;eth-p](https://github.com/eth-p))
-   Relax syntax mapping rule restrictions to allow brace expansion [#&#8203;2865](sharkdp/bat#2865) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Apply clippy fixes [#&#8203;2864](sharkdp/bat#2864) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Faster startup by offloading glob matcher building to a worker thread [#&#8203;2868](sharkdp/bat#2868) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Display which theme is the default one in basic output (no colors), see [#&#8203;2937](sharkdp/bat#2937) ([@&#8203;sblondon](https://github.com/sblondon))
-   Display which theme is the default one in colored output, see [#&#8203;2838](sharkdp/bat#2838) ([@&#8203;sblondon](https://github.com/sblondon))
-   Add aarch64-apple-darwin ("Apple Silicon") binary tarballs to releases, see [#&#8203;2967](sharkdp/bat#2967) ([@&#8203;someposer](https://github.com/someposer))
-   Update the Lisp syntax, see [#&#8203;2970](sharkdp/bat#2970) ([@&#8203;ccqpein](https://github.com/ccqpein))
-   Use bat's ANSI iterator during tab expansion, see [#&#8203;2998](sharkdp/bat#2998) ([@&#8203;eth-p](https://github.com/eth-p))
-   Support 'statically linked binary' for aarch64 in 'Release' page, see [#&#8203;2992](sharkdp/bat#2992) ([@&#8203;tzq0301](https://github.com/tzq0301))
-   Update options in shell completions and the man page of `bat`, see [#&#8203;2995](sharkdp/bat#2995) ([@&#8203;akinomyoga](https://github.com/akinomyoga))
-   Update nix dev-dependency to v0.29.0, see [#&#8203;3112](sharkdp/bat#3112) ([@&#8203;decathorpe](https://github.com/decathorpe))
-   Bump MSRV to [1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html), see [#&#8203;3154](sharkdp/bat#3154) ([@&#8203;keith-hall](https://github.com/keith-hall))
-   Update clircle dependency to remove winapi transitive dependency, see [#&#8203;3113](sharkdp/bat#3113) ([@&#8203;niklasmohrin](https://github.com/niklasmohrin))

#### Syntaxes

-   `cmd-help`: scope subcommands followed by other terms, and other misc improvements, see [#&#8203;2819](sharkdp/bat#2819) ([@&#8203;victor-gp](https://github.com/victor-gp))
-   Upgrade JQ syntax, see [#&#8203;2820](sharkdp/bat#2820) ([@&#8203;dependabot](https://github.com/dependabot)\[bot])
-   Add syntax mapping for quadman quadlets [#&#8203;2866](sharkdp/bat#2866) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Map containers .conf files to TOML syntax [#&#8203;2867](sharkdp/bat#2867) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Associate `.xsh` files with `xonsh` syntax that is Python, see [#&#8203;2840](sharkdp/bat#2840) ([@&#8203;anki-code](https://github.com/anki-code))
-   Associate JSON with Comments `.jsonc` with `json` syntax, see [#&#8203;2795](sharkdp/bat#2795) ([@&#8203;mxaddict](https://github.com/mxaddict))
-   Associate JSON-LD `.jsonld` files with `json` syntax, see [#&#8203;3037](sharkdp/bat#3037) ([@&#8203;vorburger](https://github.com/vorburger))
-   Associate `.textproto` files with `ProtoBuf` syntax, see [#&#8203;3038](sharkdp/bat#3038) ([@&#8203;vorburger](https://github.com/vorburger))
-   Associate GeoJSON `.geojson` files with `json` syntax, see [#&#8203;3084](sharkdp/bat#3084) ([@&#8203;mvaaltola](https://github.com/mvaaltola))
-   Associate `.aws/{config,credentials}`, see [#&#8203;2795](sharkdp/bat#2795) ([@&#8203;mxaddict](https://github.com/mxaddict))
-   Associate Wireguard config `/etc/wireguard/*.conf`, see [#&#8203;2874](sharkdp/bat#2874) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Add support for [CFML](https://www.adobe.com/products/coldfusion-family.html), see [#&#8203;3031](sharkdp/bat#3031) ([@&#8203;brenton-at-pieces](https://github.com/brenton-at-pieces))
-   Map `*.mkd` files to `Markdown` syntax, see issue [#&#8203;3060](sharkdp/bat#3060) and MR [#&#8203;3061](sharkdp/bat#3061) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Add syntax mapping for CITATION.cff, see [#&#8203;3103](sharkdp/bat#3103) ([@&#8203;Ugzuzg](https://github.com/Ugzuzg))
-   Add syntax mapping for kubernetes config files [#&#8203;3049](sharkdp/bat#3049) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Adds support for pipe delimiter for CSV [#&#8203;3115](sharkdp/bat#3115) ([@&#8203;pratik-m](https://github.com/pratik-m))
-   Add syntax mapping for `/etc/pacman.conf` [#&#8203;2961](sharkdp/bat#2961) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Associate `uv.lock` with `TOML` syntax, see [#&#8203;3132](sharkdp/bat#3132) ([@&#8203;fepegar](https://github.com/fepegar))

#### Themes

-   Patched/improved themes for better Manpage syntax highlighting support, see [#&#8203;2994](sharkdp/bat#2994) ([@&#8203;keith-hall](https://github.com/keith-hall)).

#### `bat` as a library

-   Changes to `syntax_mapping::SyntaxMapping` [#&#8203;2755](sharkdp/bat#2755) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
    -   `SyntaxMapping::get_syntax_for` is now correctly public
    -   \[BREAKING] `SyntaxMapping::{empty,builtin}` are removed; use `SyntaxMapping::new` instead
    -   \[BREAKING] `SyntaxMapping::mappings` is replaced by `SyntaxMapping::{builtin,custom,all}_mappings`
-   Make `Controller::run_with_error_handler`'s error handler `FnMut`, see [#&#8203;2831](sharkdp/bat#2831) ([@&#8203;rhysd](https://github.com/rhysd))
-   Improve compile time by 20%, see [#&#8203;2815](sharkdp/bat#2815) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Add `theme::theme` for choosing an appropriate theme based on the
    terminal's color scheme, see [#&#8203;2896](sharkdp/bat#2896) ([@&#8203;bash](https://github.com/bash))
    -   \[BREAKING] Remove `HighlightingAssets::default_theme`. Use `theme::default_theme` instead.
-   Add `PrettyPrinter::print_with_writer` for custom output destinations, see [#&#8203;3070](sharkdp/bat#3070) ([@&#8203;kojix2](https://github.com/kojix2))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS45MS40IiwidXBkYXRlZEluVmVyIjoiMzkuOTEuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need a more sustainable way to manage known syntax mappings
4 participants