Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat OSC ANSI Sequences as Invisible Text & Add OSC 8 Support #2544

Merged
merged 9 commits into from
Feb 10, 2024

Conversation

eth-p
Copy link
Collaborator

@eth-p eth-p commented Apr 17, 2023

Fixes #2541 by removing OSC sequences from lines before starting to print them. no longer considering ANSI OSC sequences as visible text.

image
(Left: Fixed, Right: Current version)

Prior to this change, we don't interpret OSC sequences in any special way. They aren't caught by console::AnsiCodeIterator, so the OSC sequences were being treated as visible text.

  • When the OSC is printed in full, the terminal would see a valid sequence and hide it. As a consequence, lines would b e wrapped too early.

  • When the OSC is broken for line wrapping, the terminal wouldn't see a valid sequence. In this case, it prints both halves of the text on different lines and appears to be junk.


Edit: I had some extra time. A couple of new commits gave special handling for the OSC 8 sequence, allowing bat to disable the hyperlink at the end of the line and then re-emit it at the start of the next line.

image

@eth-p eth-p requested review from sharkdp and Enselic April 17, 2023 05:15
@eth-p eth-p force-pushed the fix-2541 branch 3 times, most recently from a34f6f4 to bca3a3e Compare April 17, 2023 23:12
@eth-p eth-p changed the title Strip OSC Ansi Sequences Before Printing Treat OSC ANSI Sequences as Invisible Text Apr 18, 2023
@eth-p
Copy link
Collaborator Author

eth-p commented Apr 18, 2023

Made some changes so that we don't need to strip the sequences anymore. Instead, they'll be emitted as-is and just ignored for line wrapping purposes.

@eth-p
Copy link
Collaborator Author

eth-p commented Apr 18, 2023

I also found some possible optimizations while doing this:

  • The syntax highlighting style is being emitted for every single text chunk (rather than highlight region).

    Essentially, for a string like Text^[33mMore Text^[1mEven More Text what we do is:

    1. For Text
      • Emit highlight style: ^[38;2;100;100;100m
      • Emit text: Text
    2. For ^[33m
      • Passthrough detected ANSI sequence: ^[33m
    3. For More Text
      • Emit highlight style: ^[38;2;100;100;100m
      • Re-emit detected ANSI attributes as sequence: ^[33m
      • Emit text: More Text
    4. For ^[1m
      • Passthrough detected ANSI sequence: ^[1m
    5. For Even More Text
      • Emit highlight style: ^[38;2;100;100;100m
      • Re-emit detected ANSI attributes as sequence: ^[33m^[1m
      • Emit text: Even More Text

    This gives us: ^[38;2;100;100;100mText^[33m^[38;2;100;100;100m^[33mMore Text^[1m^[38;2;100;100;100m^[33m^[1mEven More Text
    Which could be simplified to: ^[38;2;100;100;100mText^[33mMore Text^[1mEven More Text

    For the plain text syntax we don't actually have a style to emit (i.e. no ^[38;2;100;100;100m everywhere), but it still acts as though there is one and double-prints the passed through styles.

  • When wrapping is enabled, we could probably be reusing a single String as a line_buf.
    As it is right now, we call String::with_capacity multiple times per line. A better approach would be to store it on the InteractivePrinter struct and reuse it across lines so we don't have to allocate as much.

    I tested this, and it reduced performance on release builds. I guess reusing the variable prevents LLVM from performing a couple of optimizations.

@eth-p eth-p changed the title Treat OSC ANSI Sequences as Invisible Text Treat OSC ANSI Sequences as Invisible Text & Add OSC 8 Support Apr 18, 2023
@eth-p
Copy link
Collaborator Author

eth-p commented Apr 18, 2023

I figured with such a significant change to a hot loop, it might be worth benchmarking bat before and after my changes.

I no longer have an x86_64 system running MacOS to test with, but I can happily say that these changes actually result in a small performance gain under aarch64 :)

image

Notes

  • Both executables were compiled with cargo build --release --locked using rustc version 1.65.0.
  • bat-master is commit e828d78.
  • bat-with-fix is commit d899f82.
  • The lots-of-ansi.txt file was created using bat src/**/*.rs --decorations=always --wrap=never --paging=never --color=always > lots-of-ansi.txt, so it basically feeds bat's output back to itself.
  • Both versions output the same text, verified by running sha1sum against the the executables' output with the flags --terminal-width=80 --wrap=character --decorations=always --color=always lots-of-ansi.txt --paging=never

@eth-p
Copy link
Collaborator Author

eth-p commented Feb 8, 2024

Rebased to latest master and re-ran benchmarks.

Benchmarks: aarch64

Apple ARM M1 architecture, running Darwin 23.0.0 kernel.

With ANSI content in the input, these changes result in a minor performance increase.

image

With non-ANSI inputs, there were no performance regressions:

image

Benchmarks: amd64

Intel i7-4790k Haswell architecture, running Linux 6.1 kernel with glibc 2.38.

With ANSI content in the input, these changes result in a minor performance increase.

image

With non-ANSI inputs, there were no obvious performance regressions. Differences between the two builds was within margin of error.

image

If you would like benchmarks for a more modern architecture (AMD Zen 4) or for Windows, I can install some development tools on my other PC.

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much and sorry for leaving this hanging for so long.

Admittedly, I have not reviewed the very core of this (the escape sequence parser/iterator) in detail, but it looks very reasonable and is well covered by tests.

Also, thank you for performing the benchmarks!

@eth-p
Copy link
Collaborator Author

eth-p commented Feb 10, 2024

The failed audit check is related to libgit2, which isn't changed by this pull request. Merging this PR.

@eth-p eth-p enabled auto-merge February 10, 2024 05:06
@eth-p eth-p disabled auto-merge February 10, 2024 05:06
@eth-p eth-p enabled auto-merge February 10, 2024 05:08
eth-p added 5 commits February 9, 2024 22:09
This can be used to extract a subset of ANSI escape sequences from a
string of text. I have big plans for this eventually, but for now, it'll
be used to strip OSC before printing.
This commit strips OSC (Operating System Command) sequences before
printing lines. Eventually when time permits, I want to add back
support for printing OSC sequences (and improve it to treat hyperlinks
like an attribute).

Until then, this should help prevent garbled output :)
More specifically, the test ensures that OSC sequences don't end up
wrapping the line.
This is an iterator for escape sequences, using
EscapeSequenceOffsetsIterator for the underlying parsing of individual
escape sequences.
@eth-p eth-p merged commit 5a2a20a into sharkdp:master Feb 10, 2024
22 checks passed
@eth-p eth-p deleted the fix-2541 branch February 10, 2024 06:20
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 8, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [sharkdp/bat](https://github.com/sharkdp/bat) | minor | `v0.24.0` -> `v0.25.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>sharkdp/bat (sharkdp/bat)</summary>

### [`v0.25.0`](https://github.com/sharkdp/bat/blob/HEAD/CHANGELOG.md#v0250)

[Compare Source](sharkdp/bat@v0.24.0...v0.25.0)

#### Features

-   Set terminal title to file names when Paging is not Paging::Never [#&#8203;2807](sharkdp/bat#2807) ([@&#8203;Oliver-Looney](https://github.com/Oliver-Looney))
-   `bat --squeeze-blank`/`bat -s` will now squeeze consecutive empty lines, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   `bat --squeeze-limit` to set the maximum number of empty consecutive when using `--squeeze-blank`, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   `PrettyPrinter::squeeze_empty_lines` to support line squeezing for bat as a library, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Syntax highlighting for JavaScript files that start with `#!/usr/bin/env bun` [#&#8203;2913](sharkdp/bat#2913) ([@&#8203;sharunkumar](https://github.com/sharunkumar))
-   `bat --strip-ansi={never,always,auto}` to remove ANSI escape sequences from bat's input, see [#&#8203;2999](sharkdp/bat#2999) ([@&#8203;eth-p](https://github.com/eth-p))
-   Add or remove individual style components without replacing all styles [#&#8203;2929](sharkdp/bat#2929) ([@&#8203;eth-p](https://github.com/eth-p))
-   Automatically choose theme based on the terminal's color scheme, see [#&#8203;2896](sharkdp/bat#2896) ([@&#8203;bash](https://github.com/bash))
-   Add option `--binary=as-text` for printing binary content, see issue [#&#8203;2974](sharkdp/bat#2974) and MR [#&#8203;2976](sharkdp/bat#2976) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Make shell completions available via `--completion <shell>`, see issue [#&#8203;2057](sharkdp/bat#2057) and MR [#&#8203;3126](sharkdp/bat#3126) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Syntax highlighting for puppet code blocks within Markdown files, see [#&#8203;3152](sharkdp/bat#3152) ([@&#8203;liliwilson](https://github.com/liliwilson))

#### Bugfixes

-   Fix long file name wrapping in header, see [#&#8203;2835](sharkdp/bat#2835) ([@&#8203;FilipRazek](https://github.com/FilipRazek))
-   Fix `NO_COLOR` support, see [#&#8203;2767](sharkdp/bat#2767) ([@&#8203;acuteenvy](https://github.com/acuteenvy))
-   Fix handling of inputs with OSC ANSI escape sequences, see [#&#8203;2541](sharkdp/bat#2541) and [#&#8203;2544](sharkdp/bat#2544) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix handling of inputs with combined ANSI color and attribute sequences, see [#&#8203;2185](sharkdp/bat#2185) and [#&#8203;2856](sharkdp/bat#2856) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix panel width when line 10000 wraps, see [#&#8203;2854](sharkdp/bat#2854) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix compile issue of `time` dependency caused by standard library regression [#&#8203;3045](sharkdp/bat#3045) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Fix override behavior of --plain and --paging, see issue [#&#8203;2731](sharkdp/bat#2731) and MR [#&#8203;3108](sharkdp/bat#3108) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Fix bugs in `$LESSOPEN` support, see [#&#8203;2805](sharkdp/bat#2805) ([@&#8203;Anomalocaridid](https://github.com/Anomalocaridid))

#### Other

-   Upgrade to Rust 2021 edition [#&#8203;2748](sharkdp/bat#2748) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Refactor and cleanup build script [#&#8203;2756](sharkdp/bat#2756) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Checks changelog has been written to for MRs in CI [#&#8203;2766](sharkdp/bat#2766) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
    -   Use GitHub API to get correct MR submitter [#&#8203;2791](sharkdp/bat#2791) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Minor benchmark script improvements [#&#8203;2768](sharkdp/bat#2768) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Update Arch Linux package URL in README files [#&#8203;2779](sharkdp/bat#2779) ([@&#8203;brunobell](https://github.com/brunobell))
-   Update and improve `zsh` completion, see [#&#8203;2772](sharkdp/bat#2772) ([@&#8203;okapia](https://github.com/okapia))
-   More extensible syntax mapping mechanism [#&#8203;2755](sharkdp/bat#2755) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Use proper Architecture for Debian packages built for musl, see [#&#8203;2811](sharkdp/bat#2811) ([@&#8203;Enselic](https://github.com/Enselic))
-   Pull in fix for unsafe-libyaml security advisory, see [#&#8203;2812](sharkdp/bat#2812) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Update git-version dependency to use Syn v2, see [#&#8203;2816](sharkdp/bat#2816) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Update git2 dependency to v0.18.2, see [#&#8203;2852](sharkdp/bat#2852) ([@&#8203;eth-p](https://github.com/eth-p))
-   Improve performance when color output disabled, see [#&#8203;2397](sharkdp/bat#2397) and [#&#8203;2857](sharkdp/bat#2857) ([@&#8203;eth-p](https://github.com/eth-p))
-   Relax syntax mapping rule restrictions to allow brace expansion [#&#8203;2865](sharkdp/bat#2865) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Apply clippy fixes [#&#8203;2864](sharkdp/bat#2864) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Faster startup by offloading glob matcher building to a worker thread [#&#8203;2868](sharkdp/bat#2868) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Display which theme is the default one in basic output (no colors), see [#&#8203;2937](sharkdp/bat#2937) ([@&#8203;sblondon](https://github.com/sblondon))
-   Display which theme is the default one in colored output, see [#&#8203;2838](sharkdp/bat#2838) ([@&#8203;sblondon](https://github.com/sblondon))
-   Add aarch64-apple-darwin ("Apple Silicon") binary tarballs to releases, see [#&#8203;2967](sharkdp/bat#2967) ([@&#8203;someposer](https://github.com/someposer))
-   Update the Lisp syntax, see [#&#8203;2970](sharkdp/bat#2970) ([@&#8203;ccqpein](https://github.com/ccqpein))
-   Use bat's ANSI iterator during tab expansion, see [#&#8203;2998](sharkdp/bat#2998) ([@&#8203;eth-p](https://github.com/eth-p))
-   Support 'statically linked binary' for aarch64 in 'Release' page, see [#&#8203;2992](sharkdp/bat#2992) ([@&#8203;tzq0301](https://github.com/tzq0301))
-   Update options in shell completions and the man page of `bat`, see [#&#8203;2995](sharkdp/bat#2995) ([@&#8203;akinomyoga](https://github.com/akinomyoga))
-   Update nix dev-dependency to v0.29.0, see [#&#8203;3112](sharkdp/bat#3112) ([@&#8203;decathorpe](https://github.com/decathorpe))
-   Bump MSRV to [1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html), see [#&#8203;3154](sharkdp/bat#3154) ([@&#8203;keith-hall](https://github.com/keith-hall))
-   Update clircle dependency to remove winapi transitive dependency, see [#&#8203;3113](sharkdp/bat#3113) ([@&#8203;niklasmohrin](https://github.com/niklasmohrin))

#### Syntaxes

-   `cmd-help`: scope subcommands followed by other terms, and other misc improvements, see [#&#8203;2819](sharkdp/bat#2819) ([@&#8203;victor-gp](https://github.com/victor-gp))
-   Upgrade JQ syntax, see [#&#8203;2820](sharkdp/bat#2820) ([@&#8203;dependabot](https://github.com/dependabot)\[bot])
-   Add syntax mapping for quadman quadlets [#&#8203;2866](sharkdp/bat#2866) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Map containers .conf files to TOML syntax [#&#8203;2867](sharkdp/bat#2867) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Associate `.xsh` files with `xonsh` syntax that is Python, see [#&#8203;2840](sharkdp/bat#2840) ([@&#8203;anki-code](https://github.com/anki-code))
-   Associate JSON with Comments `.jsonc` with `json` syntax, see [#&#8203;2795](sharkdp/bat#2795) ([@&#8203;mxaddict](https://github.com/mxaddict))
-   Associate JSON-LD `.jsonld` files with `json` syntax, see [#&#8203;3037](sharkdp/bat#3037) ([@&#8203;vorburger](https://github.com/vorburger))
-   Associate `.textproto` files with `ProtoBuf` syntax, see [#&#8203;3038](sharkdp/bat#3038) ([@&#8203;vorburger](https://github.com/vorburger))
-   Associate GeoJSON `.geojson` files with `json` syntax, see [#&#8203;3084](sharkdp/bat#3084) ([@&#8203;mvaaltola](https://github.com/mvaaltola))
-   Associate `.aws/{config,credentials}`, see [#&#8203;2795](sharkdp/bat#2795) ([@&#8203;mxaddict](https://github.com/mxaddict))
-   Associate Wireguard config `/etc/wireguard/*.conf`, see [#&#8203;2874](sharkdp/bat#2874) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Add support for [CFML](https://www.adobe.com/products/coldfusion-family.html), see [#&#8203;3031](sharkdp/bat#3031) ([@&#8203;brenton-at-pieces](https://github.com/brenton-at-pieces))
-   Map `*.mkd` files to `Markdown` syntax, see issue [#&#8203;3060](sharkdp/bat#3060) and MR [#&#8203;3061](sharkdp/bat#3061) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Add syntax mapping for CITATION.cff, see [#&#8203;3103](sharkdp/bat#3103) ([@&#8203;Ugzuzg](https://github.com/Ugzuzg))
-   Add syntax mapping for kubernetes config files [#&#8203;3049](sharkdp/bat#3049) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Adds support for pipe delimiter for CSV [#&#8203;3115](sharkdp/bat#3115) ([@&#8203;pratik-m](https://github.com/pratik-m))
-   Add syntax mapping for `/etc/pacman.conf` [#&#8203;2961](sharkdp/bat#2961) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Associate `uv.lock` with `TOML` syntax, see [#&#8203;3132](sharkdp/bat#3132) ([@&#8203;fepegar](https://github.com/fepegar))

#### Themes

-   Patched/improved themes for better Manpage syntax highlighting support, see [#&#8203;2994](sharkdp/bat#2994) ([@&#8203;keith-hall](https://github.com/keith-hall)).

#### `bat` as a library

-   Changes to `syntax_mapping::SyntaxMapping` [#&#8203;2755](sharkdp/bat#2755) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
    -   `SyntaxMapping::get_syntax_for` is now correctly public
    -   \[BREAKING] `SyntaxMapping::{empty,builtin}` are removed; use `SyntaxMapping::new` instead
    -   \[BREAKING] `SyntaxMapping::mappings` is replaced by `SyntaxMapping::{builtin,custom,all}_mappings`
-   Make `Controller::run_with_error_handler`'s error handler `FnMut`, see [#&#8203;2831](sharkdp/bat#2831) ([@&#8203;rhysd](https://github.com/rhysd))
-   Improve compile time by 20%, see [#&#8203;2815](sharkdp/bat#2815) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Add `theme::theme` for choosing an appropriate theme based on the
    terminal's color scheme, see [#&#8203;2896](sharkdp/bat#2896) ([@&#8203;bash](https://github.com/bash))
    -   \[BREAKING] Remove `HighlightingAssets::default_theme`. Use `theme::default_theme` instead.
-   Add `PrettyPrinter::print_with_writer` for custom output destinations, see [#&#8203;3070](sharkdp/bat#3070) ([@&#8203;kojix2](https://github.com/kojix2))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS45MS40IiwidXBkYXRlZEluVmVyIjoiMzkuOTEuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erratic output when OSC8 sequences are present
3 participants