-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support scientific notation in write_csv()
#11929
Comments
It would be ideal if we could specify a |
@stinodego I definitely think this particular request is useful, but I would be wary of adding an overly comprehensive Float formatting/parsing is actually a surprisingly large bottleneck in reading/writing CSV files, and if you add options to the formatting you either:
Remember that in Rust |
Maybe it should be a separate parameter import polars as pl
df = pl.DataFrame({"float": [1.0e10, 1.0e15, 1.0e20]})
df.write_csv("float.csv")
|
@stinodego would you accept a pull request for an efficient implementation of |
@Wainberg I'm not a huge fan of accepting a format string. What features would you be looking for besides precision and scientific-ness? |
I guess most generally it would be nice to control:
Ideally, you want to be able to do this separately for different columns. The nice thing about C/Python/Rust-style format specifiers is that support for them is already being planned for Of course, that wouldn't give you fine-grained customization over the magnitude threshold for switching from regular to scientifc notation, but usually 'g' vs 'f' vs 'e' is enough choice and in cases where it's not, it's straightforward to use a custom when/then/otherwise statement and convert the float columns to string before writing. |
@Wainberg similarly, having the ability to specify NOT to use scientific notation would be helpful as well. Some tooling understands 5e-6 but others can only parse 0.000005 |
Description
Currently you can do e.g.
write_csv(..., float_precision=6)
, but this is likef'{value:.6f}'
notf'{value:.6g}'
. This means that small floating-point numbers will tend to get rounded to 0 when usingfloat_precision
. Ideally it would be possible to specifyfloat_format='.6g'
like in pandas'sto_csv()
, but any way of supporting scientific notation would help. I don't think this is addressed by #7475 but could be wrong.The current work-around I've been using is to add
.with_columns(pl.selectors.float().map_elements('{:.12g}'.format))
before thewrite_csv()
.The text was updated successfully, but these errors were encountered: