-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Comparison to pandas? #812
Comments
This is quite cool, thanks for sharing! I wonder what the best way might be to integrate this. Much of this is really about how Arrow as a whole compares to Pandas (e.g. the data types). Maybe we could consider explicit "if you did this in Pandas, do this with ADBC" examples? There's some examples going into the next release: https://arrow.apache.org/adbc/main/python/recipe/postgresql.html The other thing could be highlighting your post somehow (retweeting it?) Medium-to-long term, I was actually hoping we could integrate ADBC directly in the Pandas read/write_sql functions. |
That recipes page looks nice - I'll see what I can add there. @datapythonista @MarcoGorelli think this is worth tweeting from the pandas account? I have a Mastadon so can post there, but this might be good for Twitter users. Though arguably also strange for pandas without maybe a large context on what this could mean for pandas itself; I'm pretty indifferent As far as your medium to long term goal I don't want to speak for the entire pandas team just yet but I agree it would be good to integrate directly. The sql part of the pandas codebase has a lot of legacy cruft and isn't as actively maintained as other parts, so pandas should stand to gain a lot from using that internally |
nice! reading posting - the access to Twitter is in the 1password (see Joris' email), if you join then you should be able to access it and post |
FYI I started integration with pandas in pandas-dev/pandas#53869 . Looks like we aren't too far off on meeting the pandas requirements, just need int8 support for postgres and datetime support for the postgres/sqlite drivers |
Oh that's great! For SQLite: is there a standard date/time/datetime encoding? That seems to be the main issue with stuffing those values in SQLite. |
Actually I'll just take a look at what pandas does currently when I get a chance, and then think about how to mimic that |
pandas will just defer to sqlalchemy or sqlite3. I think both just store those values as ISO strings. Here are relevant docs: https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters |
Cool, thanks. It looks like read_sql has you explicitly specify which columns to read as datetimes, so we can probably reasonably add an option for that to the SQLite driver. Though it might be easier/more consistent to just do it as a post-processing step instead of in-driver...? But given the layers in between, it may be valuable to just support it directly anyways. |
Yea the >>> import pandas as pd
>>> from sqlalchemy import create_engine
>>> df = pd.DataFrame([[pd.Timestamp("2023-01-01")]], columns=["dt"])
>>> engine = create_engine('sqlite://', echo=False)
>>> df.to_sql("test", con=engine, index=False)
>>> pd.read_sql("test", con=engine).dtypes
dt datetime64[ns]
dtype: object |
Ah, I see, thanks. In that case, maybe the right option to provide is some way to map the SQLite column type to a date/time/datetime Arrow type and format string, and then Pandas can configure it to mimic the standard library sqlite3 module. (Though it sounds like SQLAlchemy can do this itself as well from that reference.) |
To circle back here after about exactly a year, I've opened a PR with brief examples of using ADBC with Pandas: #1940 |
Fixes #812. --------- Co-authored-by: William Ayd <[email protected]>
I was experimenting with the ADBC postgres driver in comparison to equivalent pandas read/write sql functions. I put a rough draft of that up on my blog:
https://willayd.com/leveraging-the-adbc-driver-in-analytics-workflows.html
Do you think any of that is worth integrating into the documentation here? Not sure how much we care to highlight differences here against other tools in the space
The text was updated successfully, but these errors were encountered: