Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix #5044] Support converting 'yyyymmdd' format to date #5078

Merged
merged 1 commit into from
Nov 27, 2023

Conversation

Tangruilin
Copy link
Contributor

Which issue does this PR close?

Closes #5044 .

Rationale for this change

format "yyyymmdd" and "yymmdd" cannot be converted to DateType, which do not comply with the ISO standard

What changes are included in this PR?

the way to parse "yyyymmdd" and "yymmdd" are added to parse_date

Are there any user-facing changes?

no

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 15, 2023
Comment on lines 576 to 574
if string.len() != 6 && string.len() != 8 {
return None;
}
for ch in string.bytes() {
if ch < b'0' || ch > b'9' {
return None;
}
}
let (year, month, day) = match string.len() {
6 => (
parse_two_digit_year(digits[0] as i32 * 10 + digits[1] as i32) as u16,
digits[2] * 10 + digits[3],
digits[4] * 10 + digits[5],
),
8 => (
digits[0] as u16 * 1000
+ digits[1] as u16 * 100
+ digits[2] as u16 * 10
+ digits[3] as u16,
digits[4] * 10 + digits[5],
digits[6] * 10 + digits[7],
),
_ => return None,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if string.len() != 6 && string.len() != 8 {
return None;
}
for ch in string.bytes() {
if ch < b'0' || ch > b'9' {
return None;
}
}
let (year, month, day) = match string.len() {
6 => (
parse_two_digit_year(digits[0] as i32 * 10 + digits[1] as i32) as u16,
digits[2] * 10 + digits[3],
digits[4] * 10 + digits[5],
),
8 => (
digits[0] as u16 * 1000
+ digits[1] as u16 * 100
+ digits[2] as u16 * 10
+ digits[3] as u16,
digits[4] * 10 + digits[5],
digits[6] * 10 + digits[7],
),
_ => return None,
};
let (year, month, day) = match (mask, string.len()) {
(0b111111, 6) => (
parse_two_digit_year(digits[0] as i32 * 10 + digits[1] as i32) as u16,
digits[2] * 10 + digits[3],
digits[4] * 10 + digits[5],
),
(0b11111111, 8) => (
digits[0] as u16 * 1000
+ digits[1] as u16 * 100
+ digits[2] as u16 * 10
+ digits[3] as u16,
digits[4] * 10 + digits[5],
digits[6] * 10 + digits[7],
),
_ => return None,
};

@@ -544,6 +544,18 @@ const EPOCH_DAYS_FROM_CE: i32 = 719_163;
/// Error message if nanosecond conversion request beyond supported interval
const ERR_NANOSECONDS_NOT_SUPPORTED: &str = "The dates that can be represented as nanoseconds have to be between 1677-09-21T00:12:44.0 and 2262-04-11T23:47:16.854775804";

fn parse_two_digit_year(input_year: i32) -> i32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a doc comment explaining what this is doing, and perhaps linking to the docs of the system it is based on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW https://www.rfc-editor.org/rfc/rfc3339#section-3 explicitly disallows 2 digit years as they are ambiguous

Copy link
Contributor Author

@Tangruilin Tangruilin Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW https://www.rfc-editor.org/rfc/rfc3339#section-3 explicitly disallows 2 digit years as they are ambiguous

It refer to https://www.postgresql.org/docs/current/datatype-datetime.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW https://www.rfc-editor.org/rfc/rfc3339#section-3 explicitly disallows 2 digit years as they are ambiguous

image
Maybe you need to make a decision if two digit year should be supported, and I will finish this PR by the decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not support them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not support them

Okey. I will modify the code. Thanks for suggestion

@tustvold
Copy link
Contributor

Marking as draft to show this isn't waiting on a review, feel free to unmark when you would like me to take another look

@tustvold tustvold marked this pull request as draft November 18, 2023 21:15
@Tangruilin Tangruilin force-pushed the fix#5044 branch 2 times, most recently from 3571d58 to 1c4520a Compare November 26, 2023 16:33
@Tangruilin Tangruilin marked this pull request as ready for review November 26, 2023 16:33
arrow-cast/src/parse.rs Outdated Show resolved Hide resolved
arrow-cast/src/parse.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of removing some redundant checks, this looks good now thank you

@Tangruilin Tangruilin force-pushed the fix#5044 branch 2 times, most recently from b5317ef to e032f02 Compare November 27, 2023 02:33
@Tangruilin
Copy link
Contributor Author

I took the liberty of removing some redundant checks, this looks good now thank you

I modify the code as comment, now you can retry it

@tustvold tustvold merged commit 409bb81 into apache:master Nov 27, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cast format 'yyyymmdd' to Date32 give a error
2 participants