Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone aware timestamp parsing (#3794) #3795

Merged
merged 3 commits into from
Mar 4, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 2, 2023

Which issue does this PR close?

Closes #3794

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 2, 2023
@@ -76,12 +42,14 @@ use chrono::prelude::*;
/// "2023-01-01 040506 +07:30:00",
/// "2023-01-01 04:05:06.789 PST",
/// "2023-01-01 04:05:06.789 -08",
#[inline]
pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
pub fn string_to_datetime<T: TimeZone>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a question here:
if I want to parse a string with time zone, what should I do with string_to_datetime? Do I have to use string_to_timestamp_nanos ?
The precondition is I do not know the format of timestamp-like string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this question

I would think you could call string_to_datetime<Utc>(...) which would give you a chrono DateTime<Utc> and you can then then do whatever you want with the result (e.g. convert it into a nanosecond timestamp, etc)

let utc = date.naive_utc().to_string();
assert_eq!(utc, "2020-09-08 13:42:29");
let local = date.naive_local().to_string();
assert_eq!(local, "2020-09-08 15:42:29");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some tests to check timestamp is correct?

Suggested change
assert_eq!(local, "2020-09-08 15:42:29");
assert_eq!(local, "2020-09-08 15:42:29");
let &tz = Local::now().offset();
let date = string_to_datetime(&tz, "2020-09-08T13:42:29Z").unwrap();
let dt = NaiveDateTime::parse_from_str("2020-09-08T13:42:29Z","%Y-%m-%dT%H:%M:%SZ").unwrap();
let ts = date.timestamp();
assert_eq!(dt.timestamp(), ts);
let date = string_to_datetime(&tz, "2020-09-08 13:42:29").unwrap();
assert_eq!(dt.timestamp() - (tz.local_minus_utc() as i64), date.timestamp());

Copy link
Contributor Author

@tustvold tustvold Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid using Local::now in the tests as this makes the tests not reproducible across machines. The tests above should cover this, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the tests above can run ok across machines, you can try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do, but they don't consistently test the behaviour. For example my machine (and CI) is set to UTC, which means it won't test the behaviour at all...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use two different time zone to test the behavior. For example:

 let tz1: Tz = "+02:00".parse().unwrap();
let tz2: Tz = "+08:00".parse().unwrap();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a further test, let me know if that isn't what you meant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is OK, thx.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- thank you @tustvold and thank you @MachaelLee for bringing this issue up.

I wonder if you could test that this change works for your project's usecase somehow?

@@ -76,12 +42,14 @@ use chrono::prelude::*;
/// "2023-01-01 040506 +07:30:00",
/// "2023-01-01 04:05:06.789 PST",
/// "2023-01-01 04:05:06.789 -08",
#[inline]
pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
pub fn string_to_datetime<T: TimeZone>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this question

I would think you could call string_to_datetime<Utc>(...) which would give you a chrono DateTime<Utc> and you can then then do whatever you want with the result (e.g. convert it into a nanosecond timestamp, etc)

@@ -112,34 +80,44 @@ pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> {
// without a timezone specifier as a local time, using T as a separator
// Example: 2020-09-08T13:42:29.190855
if let Ok(ts) = NaiveDateTime::parse_from_str(s, "%Y-%m-%dT%H:%M:%S%.f") {
return to_timestamp_nanos(ts);
if let Some(offset) = timezone.offset_from_local_datetime(&ts).single() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +162 to +164
/// For example, both `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`,
/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some context got lost -- I only think this statement is true if the system timezone is set to UTC - 5:

Suggested change
/// For example, both `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`,
/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value
///
/// For example, `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`,
/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value
/// if the system timezone is set to Americas/New_York (UTC-5).

Copy link
Contributor Author

@tustvold tustvold Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that context was actually incorrect. They will always parse as the same regardless of the local timezone. If you want local timezone specific behaviour you should pass Local to the function above

arrow-cast/src/parse.rs Outdated Show resolved Hide resolved
let dt =
NaiveDateTime::parse_from_str("2020-09-08T13:42:29Z", "%Y-%m-%dT%H:%M:%SZ")
.unwrap();
let local: Tz = "+08:00".parse().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tustvold tustvold merged commit 6cd0917 into apache:master Mar 4, 2023
@ursabot
Copy link

ursabot commented Mar 4, 2023

Benchmark runs are scheduled for baseline = 7fdd0d8 and contender = 6cd0917. 6cd0917 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timezone Aware Timestamp Parsing
4 participants