Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fixed ADBC to_sql creation of table when using public schema #57974

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

shabab477
Copy link
Contributor

@shabab477 shabab477 commented Mar 23, 2024

Problem: table on public schema being lost when tried to be created

Solution: used db_schema_name argument to specify schema name of adbc_ingest

@shabab477 shabab477 force-pushed the bugfix/57539-fix-to-sql branch 3 times, most recently from 1e206b0 to 9b7c3e5 Compare March 24, 2024 11:21
@jorisvandenbossche jorisvandenbossche added Bug IO SQL to_sql, read_sql, read_sql_query Arrow pyarrow functionality Sprints Sprint Pull Requests labels Mar 25, 2024
@jorisvandenbossche jorisvandenbossche changed the title 57539: Fixed creation unnamed table when using public schema BUG: Fixed ADBC to_sql creation of table when using public schema Mar 25, 2024
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment on lines 1376 to 1377
@pytest.mark.parametrize("conn", postgresql_connectable)
def test_to_sql_on_public_schema_postgres(conn, request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with this, but is this specific to Postgres? (do we have other tested database flavors that support schema's?)

It seems that the existing test for the schema keyword, test_psycopg2_schema_support is indeed postgres specific, but is also only run for the psycopg2 driver. Could we extend that existing test to run with all postgresql_connectable ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems that postgresql_connectable doesn't include an ADBC-based connection, so to test ADBC you will have to test it with postgresql_adbc_conn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(do we have other tested database flavors that support schema's?)

AFAIK sqlite is the only database that doesn't have schemas. So actually can use the all_connectable fixture and imperatively xfail when "sqlite" in conn_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it seems the issue was indeed postgres specific. MySQL seems to have a different concept of schemas, as seen here.

You are right about postgresql_connectable not having adbc based connections. What I did instead, just used all_connectable and tested for postgresql specific connectables.

@WillAyd
Copy link
Member

WillAyd commented Mar 25, 2024

Very nice PR. Can you also add a whatsnew note for version 2.2.2? Outside of that and @jorisvandenbossche comment this lgtm

@shabab477 shabab477 force-pushed the bugfix/57539-fix-to-sql branch 3 times, most recently from 8daa913 to 8ddfd68 Compare March 25, 2024 21:44
df_out = sql.read_sql_table("test_public_schema", conn, schema="public")
assert test_data.equals(df_out)

# Cleanup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture is supposed to take care of the cleanup for you - are you not seeing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I wasn't seeing that, unfortunately I have no experience with pytest. Took this issue as a learning experience and did get to learn a lot. Found out how its working. Removed this cleanup part but do look at my other reply please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool no worries. I just wanted to double check as a reviewer - its hard to see the issues that lead developers down certains paths at times.

Took this issue as a learning experience and did get to learn a lot.

That's great and we appreciate the contribution

@@ -24,6 +24,7 @@ Bug fixes
~~~~~~~~~
- :meth:`DataFrame.__dataframe__` was showing bytemask instead of bitmask for ``'string[pyarrow]'`` validity buffer (:issue:`57762`)
- :meth:`DataFrame.__dataframe__` was showing non-null validity buffer (instead of ``None``) ``'string[pyarrow]'`` without missing values (:issue:`57761`)
- :meth:`DataFrame.to_sql` was failing to find the right table on PostgreSQL when using the ``public`` schema (:issue:`57539`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this technically would have affected any database and any schema, so we don't need to be overly specific about it being Postgres with the public schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Done


conn = request.getfixturevalue(conn)

from sqlalchemy.engine import Engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need to explicitly create the schema / table this way? I am under the impression that the to_sql call should take care of that anyway.

The reason I am somewhat against doing it via sqlalchemy is that ADBC connections do not use / required sqlalchemy at all. So this starts to intertwine the dependencies of those libraries in ways that is unnecessary. For example, if you look at the ADBC fixtures they do not use sqlalchemy to create / drop tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know to_sql also creates the table but apparently it does. However, I do see an issue now in the failing pipeline: data = b"\xff\x19\x04#42000Unknown database 'public'" and this happens for mysql_pymysql_engine . Maybe I can add a statement CREATE DATABASE some_db; and execute it with sqlalchemy and clean up at the end? wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to manually construct the CREATE DATABASE statement. It looks like I was wrong before about which databases have schemas versus those that do not, and apparently MySQL does not have the same concept of what a schema is as postgresql.

https://stackoverflow.com/a/11618350/621736

So I think you can go back to @jorisvandenbossche original suggestion and just use the postgresql_adbc_conn fixture instead of the all fixture

@shabab477 shabab477 force-pushed the bugfix/57539-fix-to-sql branch 5 times, most recently from 6bb2eb8 to b45a340 Compare March 27, 2024 08:17
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comment otherwise lgtm. @jorisvandenbossche anything else from your end?

@@ -1373,6 +1373,30 @@ def insert_on_conflict(table, conn, keys, data_iter):
pandasSQL.drop_table("test_insert_conflict")


@pytest.mark.parametrize("conn", all_connectable)
def test_to_sql_on_public_schema(conn, request):
if "postgresql" not in conn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but I think this can be if "sqlite" in conn or "mysql" in conn: - that way we exclude those two databases but in theory any new one beyond this that gets tested (which more than likely has a schema) will run here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes more sense and safe. Done

@WillAyd WillAyd added this to the 2.2.2 milestone Mar 27, 2024
)

df_out = sql.read_sql_table("test_public_schema", conn, schema="public")
assert test_data.equals(df_out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert test_data.equals(df_out)
tm.assert_frame.equal(test_data, df_out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Done

@shabab477 shabab477 force-pushed the bugfix/57539-fix-to-sql branch from e88a9cb to 9efb646 Compare March 27, 2024 23:19
Problem:
- Table on public schema being lost when tried to be created

Solution:
- Used db_schema_name argument to specify schema name of adbc_ingest
@shabab477 shabab477 force-pushed the bugfix/57539-fix-to-sql branch from 9efb646 to 6afa1b2 Compare March 27, 2024 23:22
@mroeschke mroeschke merged commit aba759d into pandas-dev:main Mar 28, 2024
46 checks passed
@mroeschke
Copy link
Member

Thanks @shabab477

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 28, 2024
lithomas1 pushed a commit that referenced this pull request Mar 28, 2024
…f table when using public schema) (#58050)

Backport PR #57974: BUG: Fixed ADBC to_sql creation of table when using public schema

Co-authored-by: Shabab Karim <[email protected]>
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…ndas-dev#57974)

57539: Fixed creation unnamed table when using public schema

Problem:
- Table on public schema being lost when tried to be created

Solution:
- Used db_schema_name argument to specify schema name of adbc_ingest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug IO SQL to_sql, read_sql, read_sql_query Sprints Sprint Pull Requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ADBC Postgres writer incorrectly names the table
4 participants