Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Column-Specific Float Precision #14766

Open
d-reynol opened this issue Feb 29, 2024 · 3 comments
Open

Support for Column-Specific Float Precision #14766

d-reynol opened this issue Feb 29, 2024 · 3 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@d-reynol
Copy link

Description

Proposal:
write_csv() & sink_csv() allow you to specify a single precision to apply to all float columns.

Update this behavior to support dictionaries of {'col': precision} to allow for column-specific formatting.

Background & Use Case:
I'm currently using polars for ETL in a legacy environment, and the downstream tooling expects columns to have the same precision as the database. NUMERIC(19,6) expects 6 decimal places, NUMERIC(5,3) expects 3 & so on.

This is related to #11929 & #7133 , but a bit of a narrower request.

@d-reynol d-reynol added the enhancement New feature or an improvement of an existing feature label Feb 29, 2024
@Julian-J-S
Copy link
Contributor

Julian-J-S commented Feb 29, 2024

I see you point and it makes sense!

Ideally you would use the Decimal type which maps exactly to those database types with a specific precision and scale!
Unfortunately this is still experimental and has lots of limitations and I just checked and its not possible to write a Decimal to csv 😆

example for Decimal type

(
    pl.DataFrame(
        {
            "x": ["0.123", "0.234", "0.345"],
            "y": ["0.123", "0.234", "0.345"],
        }
    )
    .with_columns(
        pl.col("x").cast(pl.Decimal(precision=19, scale=6)),
        pl.col("y").cast(pl.Decimal(precision=5, scale=3)),
    )
    # .write_csv("decimal.csv")  # not working
)

# shape: (3, 2)
# ┌───────────────┬──────────────┐
# │ x             ┆ y            │
# │ ---           ┆ ---          │
# │ decimal[19,6] ┆ decimal[5,3] │
# ╞═══════════════╪══════════════╡
# │ 0.123000      ┆ 0.123        │
# │ 0.234000      ┆ 0.234        │
# │ 0.345000      ┆ 0.345        │
# └───────────────┴──────────────┘

@d-reynol
Copy link
Author

d-reynol commented Mar 1, 2024

Yes @JulianCologne , I agree that that would be the ideal solution.

@cmdlineluser
Copy link
Contributor

@Julian-J-S Just stumbled upon this issue - it seems your example now works (and sink_csv also).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants