Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle dates with missing information, e.g. due to corrupted or missing text #64

Open
2 tasks
rlskoeser opened this issue Oct 23, 2023 · 1 comment
Open
2 tasks
Milestone

Comments

@rlskoeser
Copy link
Member

rlskoeser commented Oct 23, 2023

comment referencing these lines in the tests:

  # should we infer unknown month? or raise an exception?
  # assert str(Undate(2022, day="2X")) == "2022-XX-2X"  # currently returns 2022-2X
  # assert str(Undate(2022, day=7)) == "2022-XX-07"   @ currently returns 2022-07

I could see a situation with a corrupted text where we legitimately only know the year and day of the month, but we can't do much with the additional day info in this case. I don't think it should error; it's also unlikely we can infer the month in this situation. Is it possible to bump to the known granularity (year) for calculation purposes, and leave the day in a string rendering of the undate?

Originally posted by @ColeDCrawford in #36 (comment)

questions:

  • does it make sense to raise an exception if you try to parse or export a date like this in a format that doesn't support this level of partial information?
  • what is the granularity of a date like this, where we know the year and day but not month? does year make sense, or do we need gradations of granularity (year is certain, day is certain but month is unknown so day is only not fully known)

  • switch default format from ISO8601 to EDTF
  • update ISO8601 format method to raise an exception (ValueError?) when used to serialize unsupported date
@rlskoeser rlskoeser changed the title I could see a situation with a corrupted text where we legitimately only know the year and day of the month, but we can't do much with the additional day info in this case. I don't think it should error; it's also unlikely we can infer the month in this situation. Is it possible to bump to the known granularity (year) for calculation purposes, and leave the day in a string rendering of the undate? Can't remember handle dates with missing information, e.g. due to corrupted or missing text Oct 23, 2023
@rlskoeser rlskoeser added the question Further information is requested label Oct 23, 2023
@rlskoeser rlskoeser added this to the 1.0 release milestone Jun 6, 2024
@rlskoeser
Copy link
Member Author

I think this behavior is a bug in the ISO8601 formatter, which doesn't support this case (unknown month with known date). I think the EDTF parser/formatter is far enough along that we could switch to make that one the default - it handles a lot more cases in terms of ambiguity and missing information. And then the ISO8601 parser should throw an exception if you try to use it to format a date that it can't handle, like year + day but no month.

I'll add a checklist to the issue with this proposed solution.

@rlskoeser rlskoeser removed the question Further information is requested label Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant