Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode millisecond precision #1024

Merged
merged 4 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 60 additions & 22 deletions native/explorer/src/datatypes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -321,14 +321,63 @@ fn convert_to_arrow_time_unit(time_unit: TimeUnit) -> ArrowTimeUnit {
}
}

// Limit the number of digits in the microsecond part of a timestamp to 6.
// This is necessary because the microsecond part of Elixir is only 6 digits.
#[inline]
pub fn microseconds_six_digits(microseconds: u32) -> u32 {
if microseconds > 999_999 {
999_999
} else {
microseconds
/// Trait for the fractional part of seconds for Elixir's time-related structs.
pub trait ExMicrosecondTuple {
/// Get the sub-second part.
fn subsec_micros(&self) -> u32;

/// Round the sub-second part to a max of 6 digits.
fn subsec_micros_limited(&self) -> u32 {
// In the event of a leap second, `subsec_micros()` may exceed 999,999.
self.subsec_micros().min(999_999_u32)
}

/// Default representation.
fn microsecond_tuple(&self) -> (u32, u32) {
(self.subsec_micros_limited(), 6)
}

/// Representation rounded for specific time-units.
fn microsecond_tuple_tu(&self, time_unit: TimeUnit) -> (u32, u32) {
let us = self.subsec_micros_limited();

match time_unit {
// If Polars implements `TimeUnit::Seconds`, we'd return `(0, 0)`.
TimeUnit::Microseconds | TimeUnit::Nanoseconds => (us, 6),
TimeUnit::Milliseconds => (((us / 1_000) * 1_000), 3),
}
}
}

impl ExMicrosecondTuple for NaiveTime {
fn subsec_micros(&self) -> u32 {
self.nanosecond() / 1_000
}
// fn microsecond_tuple(&self) -> (u32, u32) {
// let us = self.subsec_micros_limited();
// if us == 0 {
// (0, 0)
// } else {
// (self.subsec_micros_limited(), 6)
// }
// }
Copy link
Member Author

@billylanchantin billylanchantin Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's currently a test that fails:

iex> s = Explorer.Series.from_list([~T[01:55:00], ~T[15:35:00], ~T[23:00:00]])
iex> Explorer.Series.quantile(s, 0.5)
~T[15:35:00]

with:

code:  Explorer.Series.quantile(s, 0.5) === ~T[15:35:00]
left:  ~T[15:35:00.000000]
right: ~T[15:35:00]

The commented code would fix this test. However, I think we should change the test instead.

The Rust-side can't tell the difference between a chrono::NaiveTime that was originally encoded from an Elixir %Time{} with :second-precision and one with higher precision that happens to have 0 microseconds. I think the best we can do is return what we know for sure.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the test. All of the times given in the test have precision of 0, so ideally we would return precision 0, but if we have to give it higher precision, any of the precisions are fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we would return precision 0

Right. Unfortunately there's no way to annotate this now because of the lack of :second precision in Polars: #885. Also we only support :time and not {:time, :millisecond | :microsecond | :nanosecond} (though we could if we wanted).

}

impl ExMicrosecondTuple for NaiveDateTime {
fn subsec_micros(&self) -> u32 {
self.and_utc().timestamp_subsec_micros()
}
}

impl ExMicrosecondTuple for DateTime<Utc> {
fn subsec_micros(&self) -> u32 {
self.timestamp_subsec_micros()
}
}

impl ExMicrosecondTuple for DateTime<Tz> {
fn subsec_micros(&self) -> u32 {
self.timestamp_subsec_micros()
}
}

Expand Down Expand Up @@ -374,10 +423,7 @@ impl From<NaiveDateTime> for ExNaiveDateTime {
hour: dt.hour(),
minute: dt.minute(),
second: dt.second(),
microsecond: (
microseconds_six_digits(dt.and_utc().timestamp_subsec_micros()),
6,
),
microsecond: dt.microsecond_tuple(),
}
}
}
Expand Down Expand Up @@ -445,7 +491,7 @@ impl<'a> From<&'a DateTime<Tz>> for ExDateTime<'a> {
calendar: atoms::calendar_iso_module(),
day: dt_tz.day(),
hour: dt_tz.hour(),
microsecond: (microseconds_six_digits(dt_tz.timestamp_subsec_micros()), 6),
microsecond: dt_tz.microsecond_tuple(),
minute: dt_tz.minute(),
month: dt_tz.month(),
second: dt_tz.second(),
Expand Down Expand Up @@ -544,20 +590,12 @@ impl From<ExTime> for NaiveTime {

impl From<NaiveTime> for ExTime {
fn from(t: NaiveTime) -> Self {
let microseconds = t.nanosecond() / 1_000;

let ex_microseconds = if microseconds > 0 {
(microseconds_six_digits(microseconds), 6)
} else {
(0, 0)
};

ExTime {
calendar: atoms::calendar_iso_module(),
hour: t.hour(),
minute: t.minute(),
second: t.second(),
microsecond: ex_microseconds,
microsecond: t.microsecond_tuple(),
}
}
}
Expand Down
14 changes: 7 additions & 7 deletions native/explorer/src/encoding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ use crate::atoms::{
neg_infinity, precision, second, std_offset, time_zone, utc_offset, value, year, zone_abbr,
};
use crate::datatypes::{
days_to_date, microseconds_six_digits, time64ns_to_time, timestamp_to_datetime,
timestamp_to_naive_datetime, ExSeries, ExSeriesRef,
days_to_date, time64ns_to_time, timestamp_to_datetime, timestamp_to_naive_datetime,
ExMicrosecondTuple, ExSeries, ExSeriesRef,
};
use crate::ExplorerError;

Expand Down Expand Up @@ -118,8 +118,6 @@ macro_rules! unsafe_encode_naive_datetime {
$env: ident
) => {{
let ndt = timestamp_to_naive_datetime($timestamp, $time_unit);
let microseconds = ndt.and_utc().timestamp_subsec_micros();
let limited_microseconds = microseconds_six_digits(microseconds);

unsafe {
Term::new(
Expand All @@ -136,7 +134,7 @@ macro_rules! unsafe_encode_naive_datetime {
ndt.hour().encode($env).as_c_arg(),
ndt.minute().encode($env).as_c_arg(),
ndt.second().encode($env).as_c_arg(),
(limited_microseconds, 6).encode($env).as_c_arg(),
ndt.microsecond_tuple_tu($time_unit).encode($env).as_c_arg(),
],
)
.unwrap(),
Expand Down Expand Up @@ -221,7 +219,6 @@ macro_rules! unsafe_encode_datetime {
) => {{
let dt_tz = timestamp_to_datetime($timestamp, $time_unit, $time_zone);
let tz_offset = dt_tz.offset();
let limited_microseconds = microseconds_six_digits(dt_tz.timestamp_subsec_micros());

unsafe {
Term::new(
Expand All @@ -234,7 +231,10 @@ macro_rules! unsafe_encode_datetime {
$calendar_iso_module,
dt_tz.day().encode($env).as_c_arg(),
dt_tz.hour().encode($env).as_c_arg(),
(limited_microseconds, 6).encode($env).as_c_arg(),
dt_tz
.microsecond_tuple_tu($time_unit)
.encode($env)
.as_c_arg(),
dt_tz.minute().encode($env).as_c_arg(),
dt_tz.month().encode($env).as_c_arg(),
dt_tz.second().encode($env).as_c_arg(),
Expand Down
10 changes: 5 additions & 5 deletions test/explorer/series_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import ExUnit.CaptureLog

doctest Explorer.Series

Check failure on line 10 in test/explorer/series_test.exs

View workflow job for this annotation

GitHub Actions / test (25.3, 1.14)

doctest Explorer.Series.quantile/2 (193) (Explorer.SeriesTest)

Check failure on line 10 in test/explorer/series_test.exs

View workflow job for this annotation

GitHub Actions / test (26.2.1, 1.16)

doctest Explorer.Series.quantile/2 (193) (Explorer.SeriesTest)

test "defines doc metadata" do
{:docs_v1, _, :elixir, "text/markdown", _docs, _metadata, entries} =
Expand Down Expand Up @@ -4246,7 +4246,7 @@
end

test "string series to naive datetime" do
s = Series.from_list(["2023-08-29T17:39:43", "2023-08-29T17:20:09"])
s = Series.from_list(["2023-08-29T17:39:43"])
ms = Series.cast(s, {:naive_datetime, :millisecond})
us = Series.cast(s, {:naive_datetime, :microsecond})
ns = Series.cast(s, {:naive_datetime, :nanosecond})
Expand All @@ -4255,10 +4255,10 @@
assert Series.dtype(us) == {:naive_datetime, :microsecond}
assert Series.dtype(ns) == {:naive_datetime, :nanosecond}

expected = [~N[2023-08-29T17:39:43.000000], ~N[2023-08-29T17:20:09.000000]]
assert Series.to_list(ms) == expected
assert Series.to_list(us) == expected
assert Series.to_list(ns) == expected
# Note the change in precision from ms -> us.
assert Series.to_list(ms) == [~N[2023-08-29T17:39:43.000]]
assert Series.to_list(us) == [~N[2023-08-29T17:39:43.000000]]
assert Series.to_list(ns) == [~N[2023-08-29T17:39:43.000000]]
end

test "no-op with the same dtype" do
Expand Down
Loading