Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: DTA/TDA.__init__ #55623

Closed
jbrockmendel opened this issue Oct 21, 2023 · 6 comments · Fixed by #56043
Closed

DEPR: DTA/TDA.__init__ #55623

jbrockmendel opened this issue Oct 21, 2023 · 6 comments · Fixed by #56043
Labels
Datetime Datetime data dtype Deprecate Functionality to remove in pandas Timedelta Timedelta data type

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 21, 2023

I'm finding myself having to fix bugs in these, but they don't really serve much purpose. We should just use _from_sequence instead. Would need to deprecate bc IIUC fastparquet uses these.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 21, 2023
@lithomas1 lithomas1 added Datetime Datetime data dtype Timedelta Timedelta data type Deprecate Functionality to remove in pandas and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 25, 2023
@jorisvandenbossche
Copy link
Member

I was just planning to start using this as public alternative for DatetimeTZBlock: apache/arrow#38321

What would be the alternative?

In [1]: arr = np.arange(0, 10_000_000, dtype="datetime64[us]")

In [2]: dtype = pd.DatetimeTZDtype("us", tz="Europe/Brussels")

In [3]: %timeit pd.arrays.DatetimeArray(arr, dtype)
5.19 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit pd.array(arr).tz_localize("UTC").tz_convert(dtype.tz)
29.2 ms ± 798 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@jbrockmendel
Copy link
Member Author

What would be the alternative?

_from_sequence with i8s

In [17]: %timeit pd.arrays.DatetimeArray(arr, dtype)
8.7 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [18]: %timeit pd.arrays.DatetimeArray._from_sequence(arr.view("i8"), dtype=dtype)
26.4 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@jorisvandenbossche
Copy link
Member

That's a worse alternative? Not in perf, I mean, but in what the user needs to do: need to pass integers instead of datetime64, and need to use a semi-private method (EA authors should implement that, but other downstream code shouldn't use that IMO)

@jbrockmendel
Copy link
Member Author

and need to use a semi-private method (EA authors should implement that, but other downstream code shouldn't use that IMO)

The semi-privateness is weird, but ATM it is the main constructor for general EAs.

but in what the user needs to do: need to pass integers instead of datetime64

That's the same behavior you get with DatetimeIndex. We decided (and went through a deprecation cycle) that passing dt64s and a pd.DatetimeTZDtype would interpret the values as wall-times, while i8s would be interpreted as utc times. If anything, having different (and AFAIK not documented) behavior in __init__ should be avoided.

Is your concern about end users or mostly about library authors? I expect basically no actual end users are using this directly.

@jorisvandenbossche
Copy link
Member

The semi-privateness is weird, but ATM it is the main constructor for general EAs.

I would say the main general constructor is pd.array(..), and that actually also works fine, but it seems that in this case it adds a lot of overhead (would need to check why):

In [18]: %timeit pd.array(arr.view("int64"), dtype=dtype)
17.8 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

@jorisvandenbossche
Copy link
Member

Ah, because it copies by default. With disabling that it is fine:

In [26]: %timeit pd.array(arr.view("int64"), dtype=dtype, copy=False)
27.6 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Deprecate Functionality to remove in pandas Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants