Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL Support for ADBC Drivers #53869

Merged
merged 78 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
4f2b760
close to complete implementation
WillAyd Jun 26, 2023
a4ebbb5
working implementation for postgres
WillAyd Jun 26, 2023
b2cd149
sqlite implementation
WillAyd Jun 26, 2023
512bd00
Added ADBC to CI
WillAyd Jun 26, 2023
f49115c
Doc updates
WillAyd Jun 26, 2023
a8512b5
Whatsnew update
WillAyd Jun 26, 2023
c1c68ef
Better optional dependency import
WillAyd Jun 26, 2023
3d7fb15
min versions fix
WillAyd Jun 26, 2023
1093bc8
import updates
WillAyd Jun 27, 2023
926e567
docstring fix
WillAyd Jun 27, 2023
093dd86
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Jun 27, 2023
fcc21a8
doc fixup
WillAyd Jun 27, 2023
88642f7
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Jul 14, 2023
156096d
Updates for 0.6.0
WillAyd Jul 14, 2023
dd26edb
fix sqlite name escaping
WillAyd Jul 20, 2023
4d8a233
more cleanups
WillAyd Jul 20, 2023
5238e69
more 0.6.0 updates
WillAyd Aug 2, 2023
51c6c98
typo
WillAyd Aug 2, 2023
39b462b
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Aug 28, 2023
428c4f7
remove warning
WillAyd Aug 28, 2023
84d95bb
test_sql expectations
WillAyd Aug 28, 2023
a4d5b31
revert whatsnew issues
WillAyd Aug 28, 2023
21b35f6
pip deps
WillAyd Aug 28, 2023
e709d52
Suppress pyarrow warning
WillAyd Aug 28, 2023
6077fa9
Updated docs
WillAyd Aug 28, 2023
5bba566
mypy fixes
WillAyd Aug 28, 2023
236e12b
Remove stacklevel check from test
WillAyd Aug 29, 2023
b35374c
typo fix
WillAyd Aug 29, 2023
8d814e1
compat
WillAyd Aug 30, 2023
cfac2c7
Joris feedback
WillAyd Aug 31, 2023
47caaf1
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Aug 31, 2023
a22e5d1
Better test coverage with ADBC
WillAyd Aug 31, 2023
c51b7f4
cleanups
WillAyd Aug 31, 2023
7f5e6ac
feedback
WillAyd Sep 1, 2023
9ee6255
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Sep 19, 2023
a8b645f
checkpoint
WillAyd Sep 19, 2023
902df4f
more checkpoint
WillAyd Sep 19, 2023
90ca2cb
more skips
WillAyd Sep 20, 2023
d753c3c
updates
WillAyd Sep 20, 2023
d469e24
implement more
WillAyd Sep 21, 2023
2bc11a1
bump to 0.7.0
WillAyd Sep 24, 2023
f205f90
fixups
WillAyd Oct 2, 2023
2755100
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Oct 2, 2023
3577a59
cleanups
WillAyd Oct 2, 2023
c5bf7f8
sqlite fixups
WillAyd Oct 2, 2023
98d22ce
pyarrow compat
WillAyd Oct 2, 2023
4f72010
revert to using pip instead of conda
WillAyd Oct 2, 2023
7223e63
documentation cleanups
WillAyd Oct 2, 2023
c2cd90a
compat fixups
WillAyd Oct 3, 2023
de65ec0
Fix stacklevel
WillAyd Oct 3, 2023
7645727
remove unneeded code
WillAyd Oct 3, 2023
3dc914c
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Oct 16, 2023
6dbaae5
commit after drop in fixtures
WillAyd Oct 16, 2023
3bf550c
close cursor
WillAyd Oct 17, 2023
492301f
Merge branch 'main' into adbc-integration
WillAyd Oct 23, 2023
fc463a4
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Oct 23, 2023
cc72ecd
Merge branch 'main' into adbc-integration
WillAyd Oct 25, 2023
f5fd529
Merge branch 'main' into adbc-integration
WillAyd Oct 30, 2023
1207bc4
fix table dropping
WillAyd Oct 30, 2023
e8d93c7
Merge branch 'main' into adbc-integration
WillAyd Nov 10, 2023
3eed897
Bumped ADBC min to 0.8.0
WillAyd Nov 10, 2023
adef2f2
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Nov 10, 2023
67101fd
documentation
WillAyd Nov 10, 2023
ea5dcb9
doc updates
WillAyd Nov 10, 2023
fb38411
more fixups
WillAyd Nov 10, 2023
a0bed67
documentation fixups
WillAyd Nov 11, 2023
150e267
Merge branch 'main' into adbc-integration
WillAyd Nov 13, 2023
1e77f2b
fixes
WillAyd Nov 13, 2023
97ed24f
more documentation
WillAyd Nov 13, 2023
7dc07da
doc spacing
WillAyd Nov 13, 2023
52ee8d3
doc target fix
WillAyd Nov 14, 2023
1de8488
pyarrow warning compat
WillAyd Nov 14, 2023
21edaea
Merge branch 'main' into adbc-integration
WillAyd Nov 17, 2023
2d077e9
feedback
WillAyd Nov 17, 2023
accbd49
updated io documentation
WillAyd Nov 17, 2023
64b63bd
Merge branch 'main' into adbc-integration
WillAyd Nov 17, 2023
f84f63a
install updates
WillAyd Nov 18, 2023
391d045
Merge remote-tracking branch 'upstream/main' into adbc-integration
WillAyd Nov 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,7 @@ dependencies:
- zstandard>=0.17.0

- pip:
- adbc_driver_postgresql>=0.5.1
- adbc_driver_sqlite>=0.5.1
- pyqt5>=5.15.6
- tzdata>=2022.1
2 changes: 2 additions & 0 deletions ci/deps/actions-311-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,7 @@ dependencies:
- pyyaml
- py
- pip:
- adbc_driver_postgresql>=0.5.1
- adbc_driver_sqlite>=0.5.1
- pyqt5>=5.15.6
- tzdata>=2022.1
2 changes: 2 additions & 0 deletions ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,7 @@ dependencies:
- zstandard>=0.17.0

- pip:
- adbc_driver_postgresql>=0.5.1
- adbc_driver_sqlite>=0.5.1
- pyqt5>=5.15.6
- tzdata>=2022.1
2 changes: 2 additions & 0 deletions ci/deps/actions-39-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,7 @@ dependencies:
- zstandard=0.17.0

- pip:
- adbc_driver_postgresql==0.5.1
- adbc_driver_sqlite==0.5.1
- pyqt5==5.15.6
- tzdata==2022.1
2 changes: 2 additions & 0 deletions ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,7 @@ dependencies:
- zstandard>=0.17.0

- pip:
- adbc_driver_postgresql>=0.5.1
- adbc_driver_sqlite>=0.5.1
- pyqt5>=5.15.6
- tzdata>=2022.1
4 changes: 4 additions & 0 deletions ci/deps/circle-310-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,7 @@ dependencies:
- xlrd>=2.0.1
- xlsxwriter>=3.0.3
- zstandard>=0.17.0

- pip:
- adbc_driver_postgresql>=0.5.1
- adbc_driver_sqlite>=0.5.1
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ Other enhancements
- Performance improvement in :func:`read_csv` (:issue:`52632`) with ``engine="c"``
- :meth:`Categorical.from_codes` has gotten a ``validate`` parameter (:issue:`50975`)
- :meth:`DataFrame.stack` gained the ``sort`` keyword to dictate whether the resulting :class:`MultiIndex` levels are sorted (:issue:`15105`)
- :meth:`DataFrame.to_sql` and :func:`read_sql` now support ADBC drivers (:issue:`53869`)
- :meth:`DataFrame.unstack` gained the ``sort`` keyword to dictate whether the resulting :class:`MultiIndex` levels are sorted (:issue:`15105`)
- :meth:`DataFrameGroupby.agg` and :meth:`DataFrameGroupby.transform` now support grouping by multiple keys when the index is not a :class:`MultiIndex` for ``engine="numba"`` (:issue:`53486`)
- :meth:`Series.explode` now supports pyarrow-backed list types (:issue:`53602`)
Expand Down
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,8 @@ dependencies:
- pygments # Code highlighting

- pip:
- adbc_driver_postgresql
- adbc_driver_sqlite
- sphinx-toggleprompt
- typing_extensions; python_version<"3.11"
- tzdata>=2022.1
2 changes: 2 additions & 0 deletions pandas/compat/_optional.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
# Update install.rst & setup.cfg when updating versions!

VERSIONS = {
"adbc_driver_postgresql": "0.5.1",
"adbc_driver_sqlite": "0.5.1",
"bs4": "4.11.1",
"blosc": "1.21.0",
"bottleneck": "1.3.4",
Expand Down
197 changes: 197 additions & 0 deletions pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,17 @@ def read_sql(
int_column date_column
0 0 2012-11-10
1 1 2010-11-12

.. versionadded:: 2.1.0

pandas now supports reading via ADBC drivers

>>> from adbc_driver_postgresql import dbapi
>>> with dbapi.connect('postgres:///db_name') as conn: # doctest:+SKIP
... pd.read_sql('SELECT int_column FROM test_data', conn)
int_column
0 0
1 1
"""

check_dtype_backend(dtype_backend)
Expand Down Expand Up @@ -837,6 +848,10 @@ def pandasSQL_builder(
if sqlalchemy is not None and isinstance(con, (str, sqlalchemy.engine.Connectable)):
return SQLDatabase(con, schema, need_transaction)

adbc = import_optional_dependency("adbc_driver_manager.dbapi", errors="ignore")
if adbc and isinstance(con, adbc.Connection):
return ADBCDatabase(con)

warnings.warn(
"pandas only supports SQLAlchemy connectable (engine/connection) or "
"database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 "
Expand Down Expand Up @@ -2008,6 +2023,188 @@ def _create_sql_schema(


# ---- SQL without SQLAlchemy ---


class ADBCDatabase(PandasSQL):
"""
This class enables conversion between DataFrame and SQL databases
using ADBC to handle DataBase abstraction.

Parameters
----------
con : adbc_driver_manager.dbapi.Connection
"""

def __init__(self, con) -> None:
self.con = con

def execute(self, sql: str | Select | TextClause, params=None):
with self.con.cursor() as cur:
return cur(sql)

def read_table(
self,
table_name: str,
index_col: str | list[str] | None = None,
coerce_float: bool = True,
parse_dates=None,
columns=None,
schema: str | None = None,
chunksize: int | None = None,
dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
) -> DataFrame | Iterator[DataFrame]:
"""
Read SQL database table into a DataFrame. Only keyword arguments used
are table_name and schema. The rest are silently discarded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They raise an error now, I think


Parameters
----------
table_name : str
Name of SQL table in database.
schema : string, default None
Name of SQL schema in database to read from

Returns
-------
DataFrame

See Also
--------
pandas.read_sql_table
SQLDatabase.read_query

"""
if schema:
stmt = f"SELECT * FROM {schema}.{table_name}"
else:
stmt = f"SELECT * FROM {table_name}"

with self.con.cursor() as cur:
return cur(stmt).fetch_arrow_table().to_pandas()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be nice to at minimum support dtype_backend and return arrow backed types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should always just return arrow backed types. Related to the other conversation around kwargs I am unsure of the best way to handle this. If we raise for non-default arguments this wouldn't work; alternately we could except the dtype_backend argument from raising for non-default arguments but arguably is heavy handed to require end users to specify that when they are already using the ADBC driver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to add types_mapper=pd.ArrowDtype for this to work.

Not sure how I'd feel about arrow backed only, this makes sense but we went in a different way for other readers...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't realize that. Thanks for the heads up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #51846 for long-term


def read_query(
self,
sql: str,
index_col: str | list[str] | None = None,
coerce_float: bool = True,
parse_dates=None,
params=None,
chunksize: int | None = None,
dtype: DtypeArg | None = None,
dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
) -> DataFrame | Iterator[DataFrame]:
"""
Read SQL query into a DataFrame. Keyword arguments are discarded.

Parameters
----------
sql : str
SQL query to be executed.

Returns
-------
DataFrame

See Also
--------
read_sql_table : Read SQL database table into a DataFrame.
read_sql

"""
with self.con.cursor() as cur:
return cur(sql).fetch_arrow_table().to_pandas()

read_sql = read_query

def to_sql(
self,
frame,
name: str,
if_exists: Literal["fail", "replace", "append"] = "fail",
index: bool = True,
index_label=None,
schema: str | None = None,
chunksize: int | None = None,
dtype: DtypeArg | None = None,
method: Literal["multi"] | Callable | None = None,
engine: str = "auto",
**engine_kwargs,
) -> int | None:
"""
Write records stored in a DataFrame to a SQL database.
Only frame, name, if_exists and schema are valid arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index is now supported as well?


Parameters
----------
frame : DataFrame
name : string
Name of SQL table.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
schema : string, default None
Name of SQL schema in database to write to (if database flavor
supports this). If specified, this overwrites the default
schema of the SQLDatabase object.
"""
if schema:
table_name = f"{schema}.{name}"
else:
table_name = name

# TODO: pandas if_exists="append" will still create the
# table if it does not exist; ADBC has append/create
# as applicable modes, so the semantics get blurred across
# the libraries
mode = "create"
if self.has_table(name, schema):
if if_exists == "fail":
raise ValueError(f"Table '{table_name}' already exists.")
elif if_exists == "replace":
with self.con.cursor() as cur:
cur.execute(f"DROP TABLE {table_name}")
elif if_exists == "append":
mode = "append"

import pyarrow as pa

tbl = pa.Table.from_pandas(frame)
with self.con.cursor() as cur:
total_inserted = cur.adbc_ingest(table_name, tbl, mode=mode)

self.con.commit()
return total_inserted

def has_table(self, name: str, schema: str | None = None) -> bool:
meta = self.con.adbc_get_objects(
db_schema_filter=schema, table_name_filter=name
).read_all()

for catalog_schema in meta["catalog_db_schemas"].to_pylist():
if not catalog_schema:
continue
for schema_record in catalog_schema:
if not schema_record:
continue

for table_record in schema_record["db_schema_tables"]:
if table_record["table_name"] == name:
return True

return False

def _create_sql_schema(
self,
frame: DataFrame,
table_name: str,
keys: list[str] | None = None,
dtype: DtypeArg | None = None,
schema: str | None = None,
):
raise NotImplementedError("not implemented for adbc")


# sqlite-specific sql strings and handler class
# dictionary used for readability purposes
_SQL_TYPES = {
Expand Down
Loading