Document Comparison to pandas? #812

WillAyd · 2023-06-16T19:53:20Z

I was experimenting with the ADBC postgres driver in comparison to equivalent pandas read/write sql functions. I put a rough draft of that up on my blog:

https://willayd.com/leveraging-the-adbc-driver-in-analytics-workflows.html

Do you think any of that is worth integrating into the documentation here? Not sure how much we care to highlight differences here against other tools in the space

lidavidm · 2023-06-16T20:47:32Z

This is quite cool, thanks for sharing!

I wonder what the best way might be to integrate this. Much of this is really about how Arrow as a whole compares to Pandas (e.g. the data types).

Maybe we could consider explicit "if you did this in Pandas, do this with ADBC" examples? There's some examples going into the next release: https://arrow.apache.org/adbc/main/python/recipe/postgresql.html

The other thing could be highlighting your post somehow (retweeting it?)

Medium-to-long term, I was actually hoping we could integrate ADBC directly in the Pandas read/write_sql functions.

WillAyd · 2023-06-16T20:54:36Z

That recipes page looks nice - I'll see what I can add there.

@datapythonista @MarcoGorelli think this is worth tweeting from the pandas account? I have a Mastadon so can post there, but this might be good for Twitter users. Though arguably also strange for pandas without maybe a large context on what this could mean for pandas itself; I'm pretty indifferent

As far as your medium to long term goal I don't want to speak for the entire pandas team just yet but I agree it would be good to integrate directly. The sql part of the pandas codebase has a lot of legacy cruft and isn't as actively maintained as other parts, so pandas should stand to gain a lot from using that internally

MarcoGorelli · 2023-06-17T07:55:20Z

nice!

reading posting - the access to Twitter is in the 1password (see Joris' email), if you join then you should be able to access it and post

WillAyd · 2023-06-26T20:30:21Z

FYI I started integration with pandas in pandas-dev/pandas#53869 . Looks like we aren't too far off on meeting the pandas requirements, just need int8 support for postgres and datetime support for the postgres/sqlite drivers

lidavidm · 2023-06-26T20:33:11Z

Oh that's great!

For SQLite: is there a standard date/time/datetime encoding? That seems to be the main issue with stuffing those values in SQLite.

lidavidm · 2023-06-26T20:38:03Z

Actually I'll just take a look at what pandas does currently when I get a chance, and then think about how to mimic that

WillAyd · 2023-06-26T20:40:12Z

pandas will just defer to sqlalchemy or sqlite3. I think both just store those values as ISO strings. Here are relevant docs:

https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters
https://docs.sqlalchemy.org/en/20/dialects/sqlite.html#date-and-time-types

lidavidm · 2023-06-26T20:45:00Z

Cool, thanks.

It looks like read_sql has you explicitly specify which columns to read as datetimes, so we can probably reasonably add an option for that to the SQLite driver. Though it might be easier/more consistent to just do it as a post-processing step instead of in-driver...? But given the layers in between, it may be valuable to just support it directly anyways.

WillAyd · 2023-06-26T20:56:37Z

Yea the parse_dates argument is there in case the driver itself cannot infer the date, which lets pandas apply its own inferencing logic. But it isn't always required to specify and usually preferable to let the driver handle. With sqlite you can see it maintains this on roundtrip:

>>> import pandas as pd
>>> from sqlalchemy import create_engine
>>> df = pd.DataFrame([[pd.Timestamp("2023-01-01")]], columns=["dt"]) 
>>> engine = create_engine('sqlite://', echo=False)
>>> df.to_sql("test", con=engine, index=False)
>>> pd.read_sql("test", con=engine).dtypes
dt    datetime64[ns]
dtype: object

lidavidm · 2023-06-26T21:08:36Z

Ah, I see, thanks. In that case, maybe the right option to provide is some way to map the SQLite column type to a date/time/datetime Arrow type and format string, and then Pandas can configure it to mimic the standard library sqlite3 module. (Though it sounds like SQLAlchemy can do this itself as well from that reference.)

Fixes apache#812.

lidavidm · 2024-06-25T07:32:19Z

To circle back here after about exactly a year, I've opened a PR with brief examples of using ADBC with Pandas: #1940

Fixes #812. --------- Co-authored-by: William Ayd <[email protected]>

lidavidm added this to the ADBC Libraries 13 milestone May 10, 2024

lidavidm added a commit to lidavidm/arrow-adbc that referenced this issue Jun 25, 2024

docs: add recipes for using ADBC with Pandas and Polars

16b2700

Fixes apache#812.

lidavidm mentioned this issue Jun 25, 2024

docs: add recipes for using ADBC with Pandas and Polars #1940

Merged

lidavidm closed this as completed in #1940 Jun 25, 2024

lidavidm added a commit that referenced this issue Jun 25, 2024

docs: add recipes for using ADBC with Pandas and Polars (#1940)

a6cb35a

Fixes #812. --------- Co-authored-by: William Ayd <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Comparison to pandas? #812

Document Comparison to pandas? #812

WillAyd commented Jun 16, 2023

lidavidm commented Jun 16, 2023

WillAyd commented Jun 16, 2023 •

edited

Loading

MarcoGorelli commented Jun 17, 2023

WillAyd commented Jun 26, 2023

lidavidm commented Jun 26, 2023

lidavidm commented Jun 26, 2023

WillAyd commented Jun 26, 2023

lidavidm commented Jun 26, 2023

WillAyd commented Jun 26, 2023

lidavidm commented Jun 26, 2023

lidavidm commented Jun 25, 2024

Document Comparison to pandas? #812

Document Comparison to pandas? #812

Comments

WillAyd commented Jun 16, 2023

lidavidm commented Jun 16, 2023

WillAyd commented Jun 16, 2023 • edited Loading

MarcoGorelli commented Jun 17, 2023

WillAyd commented Jun 26, 2023

lidavidm commented Jun 26, 2023

lidavidm commented Jun 26, 2023

WillAyd commented Jun 26, 2023

lidavidm commented Jun 26, 2023

WillAyd commented Jun 26, 2023

lidavidm commented Jun 26, 2023

lidavidm commented Jun 25, 2024

WillAyd commented Jun 16, 2023 •

edited

Loading