-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 835 timezone offset allow MINUS SIGN (U+2212) (branch main) #1001
Conversation
9719fe6
to
0ed0140
Compare
src/format/parse.rs
Outdated
@@ -634,6 +634,9 @@ fn test_parse() { | |||
check!("", [lit!("a")]; TOO_SHORT); | |||
check!(" ", [lit!("a")]; INVALID); | |||
check!("a", [lit!("a")]; ); | |||
check!("+", [lit!("+")]; ); | |||
check!("-", [lit!("-")]; ); | |||
check!("−", [lit!("−")]; ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth using a unicode escape here to make it clear that this is the unicode minus sign "\u{2212}"
? I feel like otherwise this could be subtly deleted later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using \u{2212}
but I found code became difficult to grok.
I code commented all use of '−'
with some variation of // MINUS SIGN (U+2212)
.
If you'd strongly prefer \u{2212}
then I can change it. But I found the code comment and "native" −
easier to grok.
src/format/scan.rs
Outdated
@@ -289,12 +299,33 @@ where | |||
} | |||
} | |||
let negative = match s.as_bytes().first() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use .chars()
at the top here and match on the three allowed possibilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or potentially even char_indicies()
(we might need .peekable()
as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that would allow us to compress the allow_tz_minus_sign into the match also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chars()
worked best here. Good suggestion.
For consistency, I used len_utf8()
for each arm of this match
statement. It's a const
so will precompute the value at compile time (so no runtime cost).
Simplify timezone_offset_internal loop on negative. Issue chronotope#835 PR chronotope#1001
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Anything that makes date parsing accept more valid date strings is a big 👍 from me.
Needs a rebase though.
@@ -388,7 +418,7 @@ pub(super) fn timezone_offset_2822(s: &str) -> ParseResult<(&str, Option<i32>)> | |||
Ok((s, None)) // recommended by RFC 2822: consume but treat it as -0000 | |||
} | |||
} else { | |||
let (s_, offset) = timezone_offset(s, |s| Ok(s))?; | |||
let (s_, offset) = timezone_offset_internal(s, |s| Ok(s), false, false)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 2822 doesn't say U+2212 is acceptable. But that is because it specifies an ASCII-only email format.
I think it would be helpful if chrono was a little less strict in interpreting the standard, and can also parse RFC 2822 date strings from Unicode sources instead of only bare email.
I'm not sure when Word and similar software automatically replace -
with U+2212 when used near/between a numbers. But it is not uncommon for typographical reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a reasonable point. In this case, I lean toward remaining strict about it.
Simplify timezone_offset_internal loop on negative. Issue chronotope#835 PR chronotope#1001
7aa4210
to
e1d0d69
Compare
Had to update rust to 1.69.0. |
520a5fc
to
6617eb4
Compare
Any chance on this PR? I can submit another targeting branch |
The changes here look okay to me, would you mind cleaning up the commit history? The Also, please rebase on 0.4.x so that the MSRV CI issue gets solved. |
50192bc
to
a2ff786
Compare
PR for branch |
22b5290
to
46a61ab
Compare
Timezone signage also allows MINUS SIGN (U+2212) as specified by ISO 8601 and RFC 3339. Not for RFC 2822 format or RFC 8536 transition string. Issue chronotope#835
46a61ab
to
3ad4afe
Compare
Given that #1087 will get merged to main, do we still need this PR? |
No. Closed. |
Add parsing for MINUS SIGN (U+2212) timezone offset signage as specified in ISO 8601 and RFC 3339.
According to Wikipedia
Issue #835
Two extra commits I'd like to throw into this PR:
date
.When running tests, there are two very long running tests (tens of minutes) that require
/usr/bin/date
to run (return early if/usr/bin/date
is not present).I often have to manually edit the path to
date
in both places to force the tests to return early (effectively skip).That is, I manually changed
to
Consolidating two local
data_path
variables to one file-wideconst PATH_DATE = ...
makes this workaround editing a little easier.Additionally, consolidating the path follows the DRY principle.