Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38341: [Python] Remove usage of pandas internals DatetimeTZBlock #38321

Merged
merged 7 commits into from
Jan 8, 2024

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Oct 18, 2023

Rationale for this change

This usage probably stems from a long time ago that it was required to specify the Block type, but nowadays it's good enough to just specify the dtype, and thus cutting down on our usage of internal pandas objects.

We only need to ensure the data passed is 2D, because it is always for a Block in a dataframe (not series), and otherwise it complains the placement doesn't match.

Part of #35081

@github-actions
Copy link

⚠️ GitHub issue #35081 has been automatically assigned in GitHub to PR creator.

@apache apache deleted a comment from github-actions bot Oct 18, 2023
@jorisvandenbossche
Copy link
Member Author

So it seems that pandas doesn't preserve the unit in the DatetimeArray constructor:

In [4]: arr = np.array([1, 2, 3], dtype="datetime64[s]")

In [5]: dtype = pd.DatetimeTZDtype("s", tz="Europe/Brussels")

In [6]: pd.arrays.DatetimeArray(arr, dtype)
Out[6]: 
<DatetimeArray>
['1970-01-01 01:00:01+01:00', '1970-01-01 01:00:02+01:00',
 '1970-01-01 01:00:03+01:00']
Length: 3, dtype: datetime64[ns, Europe/Brussels]

This seems fixed on the last pandas release 2.1, so will have to keep the older code using internals for the older pandas versions.

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit -g python

@github-actions
Copy link

Revision: a133014

Submitted crossbow builds: ursacomputing/crossbow @ actions-eda4ccd5da

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.10-cython2 Github Actions
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-v3.5.0 Github Actions
test-conda-python-3.10-substrait Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-hypothesis Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.11-spark-master Github Actions
test-conda-python-3.12 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.5.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-cuda-python Github Actions
test-debian-11-python-3 Azure
test-fedora-35-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 Github Actions

@jorisvandenbossche jorisvandenbossche changed the title GH-35081: [Python] Remove usage of pandas internals DatetimeTZBlock GH-38341: [Python] Remove usage of pandas internals DatetimeTZBlock Oct 19, 2023
@github-actions
Copy link

⚠️ GitHub issue #38341 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit -g python

Copy link

github-actions bot commented Dec 1, 2023

Revision: dfcfa22

Submitted crossbow builds: ursacomputing/crossbow @ actions-6fbf28b83d

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.10-cython2 Github Actions
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-v3.5.0 Github Actions
test-conda-python-3.10-substrait Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-hypothesis Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.11-spark-master Github Actions
test-conda-python-3.12 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.5.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-cuda-python Github Actions
test-debian-11-python-3 Azure
test-fedora-38-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 Github Actions

@jorisvandenbossche jorisvandenbossche merged commit 6b93c4a into apache:main Jan 8, 2024
11 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting committer review Awaiting committer review label Jan 8, 2024
@jorisvandenbossche jorisvandenbossche deleted the pandas-internals branch January 8, 2024 13:21
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6b93c4a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…lock (apache#38321)

### Rationale for this change

This usage probably stems from a long time ago that it was required to specify the Block type, but nowadays it's good enough to just specify the dtype, and thus cutting down on our usage of internal pandas objects.

Part of apache#35081

* Closes: apache#38341

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…lock (apache#38321)

### Rationale for this change

This usage probably stems from a long time ago that it was required to specify the Block type, but nowadays it's good enough to just specify the dtype, and thus cutting down on our usage of internal pandas objects.

Part of apache#35081

* Closes: apache#38341

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…lock (apache#38321)

### Rationale for this change

This usage probably stems from a long time ago that it was required to specify the Block type, but nowadays it's good enough to just specify the dtype, and thus cutting down on our usage of internal pandas objects.

Part of apache#35081

* Closes: apache#38341

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] pandas internals: avoid using DatetimeTZBlock
1 participant