fix: load_table_from_dataframe method for issue 1692 #1698

Gaurang033 · 2023-10-23T16:17:50Z

Fixes #1692

dandhlee · 2023-10-23T16:40:00Z

Please keep titles less than 50 characters.

google/cloud/bigquery/_pandas_helpers.py

tests/unit/test_client.py

Gaurang033 · 2023-10-26T13:47:21Z

@dandhlee could you please review the changes.

…red column

Linchin · 2023-11-06T21:39:03Z

google/cloud/bigquery/_pandas_helpers.py

@@ -302,6 +302,20 @@ def bq_to_arrow_array(series, bq_field):
    return pyarrow.Array.from_pandas(series, type=arrow_type)


+def _check_nullability(arrow_fields, dataframe):
+    """Throws error if dataframe has null values and column doesn't allow nullable"""
+    if dataframe.index.name:


Could you please help me understand what lines 307-308 are for?

It's hard to let you know which exact code as I am on vacation. But when dataframe with index is used, index name is transformed as airow column name. There were two way to fix it. One was to put exception for this case or the create another column with index name. I choose the second option as it's easier. Without this the dataframe unit test case where they use index names will fail.

Thank you, hope you are having a good time in vacation! I played with dataframe's index a little bit, and I think there are several corner cases (which are likely non-comprehensive) that we need to cover:

Index doesn't have a name, does it still get converted into arrow?

Multiple index

Index with the same name as columns, which is possible with dataframes

Index columns have the same names (possible too)

multiindex

@Gaurang033 Thanks for offering up this PR.
@Linchin I appreciate this summary of additional edge cases that may not be covered by this solution.

I too worry about the edge cases, but more importantly, I worry about spending too much time and energy trying to create a work around for what we all agree is a problem in pyarrow. This feels like it creates greater complexity in our code, increased fragility, and a higher maintenance burden in the long run. Am I missing something?

I am also uncertain if we should add logic in our repo to correct an issue with pyarrow. I have been thinking about this PR as more of a temporary patchwork that maybe reverted later, but for now does help our customers. However if the logic covering the corner cases get too convoluted with the behaviors of pyarrow, I agree that perhaps it's a better idea to open an issue with pyarrow instead.

tswast · 2023-11-22T15:34:08Z

Let's just let the server-side determine if we aren't matching the correct schema. I propose #1735 instead.

Gaurang033 requested review from a team as code owners October 23, 2023 16:17

Gaurang033 requested a review from mrfaizal October 23, 2023 16:17

product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery API. labels Oct 23, 2023

dandhlee requested changes Oct 23, 2023

View reviewed changes

google/cloud/bigquery/_pandas_helpers.py Outdated Show resolved Hide resolved

google/cloud/bigquery/_pandas_helpers.py Outdated Show resolved Hide resolved

tests/unit/test_client.py Outdated Show resolved Hide resolved

Gaurang033 force-pushed the feature/fix_1692_load_table_from_dataframe branch 2 times, most recently from ed452fb to 6dd6a79 Compare October 23, 2023 17:49

Gaurang033 changed the title ~~fix: load_table_from_dataframe does not error out when nan in a requi…~~ fix: load_table_from_dataframe method for issue 1692 Oct 23, 2023

Gaurang033 requested a review from dandhlee October 23, 2023 17:51

dandhlee reviewed Oct 23, 2023

View reviewed changes

tests/unit/test_client.py Outdated Show resolved Hide resolved

Gaurang033 force-pushed the feature/fix_1692_load_table_from_dataframe branch from 6dd6a79 to 98e568f Compare October 23, 2023 18:14

Gaurang033 requested a review from dandhlee October 23, 2023 18:16

Gaurang033 force-pushed the feature/fix_1692_load_table_from_dataframe branch 2 times, most recently from f123095 to d06a2ac Compare October 31, 2023 23:50

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 2, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 2, 2023

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 2, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 2, 2023

Gaurang033 force-pushed the feature/fix_1692_load_table_from_dataframe branch from afe6a01 to a731061 Compare November 3, 2023 21:12

Gaurang033 added 2 commits November 3, 2023 17:12

fix: load_table_from_dataframe does not error out when nan in a requi…

8eb8e47

…red column

fix: test_dataframe_to_arrow_with_unknown_type testcase

3a57815

Gaurang033 force-pushed the feature/fix_1692_load_table_from_dataframe branch from a731061 to 3a57815 Compare November 3, 2023 21:12

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 6, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 6, 2023

Linchin reviewed Nov 6, 2023

View reviewed changes

Linchin mentioned this pull request Nov 6, 2023

Revisit the method load_table_from_json() #1646

Open

Merge branch 'main' into feature/fix_1692_load_table_from_dataframe

df75728

Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2023

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2023

Linchin removed the request for review from mrfaizal November 14, 2023 19:57

chalmerlowe mentioned this pull request Nov 19, 2023

load_table_from_dataframe does not error out when nan in a required column - Million dollar bug #1692

Closed

tswast closed this Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: load_table_from_dataframe method for issue 1692 #1698

fix: load_table_from_dataframe method for issue 1692 #1698

Gaurang033 commented Oct 23, 2023

dandhlee commented Oct 23, 2023

Gaurang033 commented Oct 26, 2023

Linchin Nov 6, 2023 •

edited

Loading

Gaurang033 Nov 11, 2023

Linchin Nov 14, 2023

chalmerlowe Nov 18, 2023

Linchin Nov 20, 2023

tswast commented Nov 22, 2023

fix: load_table_from_dataframe method for issue 1692 #1698

fix: load_table_from_dataframe method for issue 1692 #1698

Conversation

Gaurang033 commented Oct 23, 2023

dandhlee commented Oct 23, 2023

Gaurang033 commented Oct 26, 2023

Linchin Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Gaurang033 Nov 11, 2023

Choose a reason for hiding this comment

Linchin Nov 14, 2023

Choose a reason for hiding this comment

chalmerlowe Nov 18, 2023

Choose a reason for hiding this comment

Linchin Nov 20, 2023

Choose a reason for hiding this comment

tswast commented Nov 22, 2023

Linchin Nov 6, 2023 •

edited

Loading