-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fixed ADBC to_sql creation of table when using public schema #57974
Conversation
1e206b0
to
9b7c3e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
pandas/tests/io/test_sql.py
Outdated
@pytest.mark.parametrize("conn", postgresql_connectable) | ||
def test_to_sql_on_public_schema_postgres(conn, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super familiar with this, but is this specific to Postgres? (do we have other tested database flavors that support schema's?)
It seems that the existing test for the schema
keyword, test_psycopg2_schema_support
is indeed postgres specific, but is also only run for the psycopg2 driver. Could we extend that existing test to run with all postgresql_connectable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems that postgresql_connectable
doesn't include an ADBC-based connection, so to test ADBC you will have to test it with postgresql_adbc_conn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(do we have other tested database flavors that support schema's?)
AFAIK sqlite is the only database that doesn't have schemas. So actually can use the all_connectable
fixture and imperatively xfail when "sqlite" in conn_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it seems the issue was indeed postgres specific. MySQL seems to have a different concept of schemas
, as seen here.
You are right about postgresql_connectable
not having adbc based connections. What I did instead, just used all_connectable
and tested for postgresql specific connectables.
Very nice PR. Can you also add a whatsnew note for version 2.2.2? Outside of that and @jorisvandenbossche comment this lgtm |
8daa913
to
8ddfd68
Compare
pandas/tests/io/test_sql.py
Outdated
df_out = sql.read_sql_table("test_public_schema", conn, schema="public") | ||
assert test_data.equals(df_out) | ||
|
||
# Cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixture is supposed to take care of the cleanup for you - are you not seeing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I wasn't seeing that, unfortunately I have no experience with pytest. Took this issue as a learning experience and did get to learn a lot. Found out how its working. Removed this cleanup part but do look at my other reply please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool no worries. I just wanted to double check as a reviewer - its hard to see the issues that lead developers down certains paths at times.
Took this issue as a learning experience and did get to learn a lot.
That's great and we appreciate the contribution
doc/source/whatsnew/v2.2.2.rst
Outdated
@@ -24,6 +24,7 @@ Bug fixes | |||
~~~~~~~~~ | |||
- :meth:`DataFrame.__dataframe__` was showing bytemask instead of bitmask for ``'string[pyarrow]'`` validity buffer (:issue:`57762`) | |||
- :meth:`DataFrame.__dataframe__` was showing non-null validity buffer (instead of ``None``) ``'string[pyarrow]'`` without missing values (:issue:`57761`) | |||
- :meth:`DataFrame.to_sql` was failing to find the right table on PostgreSQL when using the ``public`` schema (:issue:`57539`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this technically would have affected any database and any schema, so we don't need to be overly specific about it being Postgres with the public schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Done
pandas/tests/io/test_sql.py
Outdated
|
||
conn = request.getfixturevalue(conn) | ||
|
||
from sqlalchemy.engine import Engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you need to explicitly create the schema / table this way? I am under the impression that the to_sql
call should take care of that anyway.
The reason I am somewhat against doing it via sqlalchemy
is that ADBC connections do not use / required sqlalchemy at all. So this starts to intertwine the dependencies of those libraries in ways that is unnecessary. For example, if you look at the ADBC fixtures they do not use sqlalchemy to create / drop tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know to_sql
also creates the table but apparently it does. However, I do see an issue now in the failing pipeline: data = b"\xff\x19\x04#42000Unknown database 'public'"
and this happens for mysql_pymysql_engine
. Maybe I can add a statement CREATE DATABASE some_db;
and execute it with sqlalchemy and clean up at the end? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to manually construct the CREATE DATABASE
statement. It looks like I was wrong before about which databases have schemas versus those that do not, and apparently MySQL does not have the same concept of what a schema is as postgresql.
https://stackoverflow.com/a/11618350/621736
So I think you can go back to @jorisvandenbossche original suggestion and just use the postgresql_adbc_conn
fixture instead of the all fixture
6bb2eb8
to
b45a340
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor comment otherwise lgtm. @jorisvandenbossche anything else from your end?
pandas/tests/io/test_sql.py
Outdated
@@ -1373,6 +1373,30 @@ def insert_on_conflict(table, conn, keys, data_iter): | |||
pandasSQL.drop_table("test_insert_conflict") | |||
|
|||
|
|||
@pytest.mark.parametrize("conn", all_connectable) | |||
def test_to_sql_on_public_schema(conn, request): | |||
if "postgresql" not in conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but I think this can be if "sqlite" in conn or "mysql" in conn:
- that way we exclude those two databases but in theory any new one beyond this that gets tested (which more than likely has a schema) will run here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes more sense and safe. Done
pandas/tests/io/test_sql.py
Outdated
) | ||
|
||
df_out = sql.read_sql_table("test_public_schema", conn, schema="public") | ||
assert test_data.equals(df_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert test_data.equals(df_out) | |
tm.assert_frame.equal(test_data, df_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Done
b45a340
to
0fdf1a7
Compare
e88a9cb
to
9efb646
Compare
Problem: - Table on public schema being lost when tried to be created Solution: - Used db_schema_name argument to specify schema name of adbc_ingest
9efb646
to
6afa1b2
Compare
Thanks @shabab477 |
…e when using public schema
…f table when using public schema) (#58050) Backport PR #57974: BUG: Fixed ADBC to_sql creation of table when using public schema Co-authored-by: Shabab Karim <[email protected]>
…ndas-dev#57974) 57539: Fixed creation unnamed table when using public schema Problem: - Table on public schema being lost when tried to be created Solution: - Used db_schema_name argument to specify schema name of adbc_ingest
Problem: table on public schema being lost when tried to be created
Solution: used db_schema_name argument to specify schema name of adbc_ingest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.