Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(rust): update string replacement codepaths following new benchmarking #6777

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Feb 10, 2023

Prepare for graphs :)

TLDR

  • Currently we leave the small-string regime for str::replacen, and use the regex engine for all other literal replacements (where len(s) > 32).
  • Benchmarking the latest code shows that the regex-based replace (in conjunction with NoExpand) now outperforms in essentially all string regimes (from small to large), allowing us to both simplify the code and get a bit of extra performance in the small-string regime.

Build

Compiled with:

maturin develop --release -- -C codegen-units=16 -C lto=thin -C target-cpu=native

Compared

(regex)       =>  reg.replace(s, NoExpand(val))
(str.replace) =>  Cow::Owned(s.replacen(pat, val, 1))

Results

Tested on strings between 8 - 4096 chars in length, replacing between 0% and 100% of the target string, averaging a large number of repeat runs to reduce timing noise...

str_replace_bench

polars_str_replace_bench.xlsx

… shows regex engine is now faster in all regimes
@github-actions github-actions bot added performance Performance issues or improvements rust Related to Rust Polars labels Feb 10, 2023
@alexander-beedie
Copy link
Collaborator Author

Can replicate the benchmark locally using the following script:
https://gist.github.com/alexander-beedie/a732e4213709696c8d603294d2e1c25c

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Feb 10, 2023

In light of the new results, I'll revisit contains next, as it uses a similar optimisation.
Update: done #6811.

@ritchie46
Copy link
Member

Nice! We can thank this PR I believe: rust-lang/regex#930

@ritchie46 ritchie46 merged commit 7d16a7d into pola-rs:master Feb 10, 2023
@alexander-beedie alexander-beedie deleted the string-replace-performance branch February 10, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants