-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset size on CLI #345
Dataset size on CLI #345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a quick question. Is this API call always reliable? I don't see the error handler of the API call.
I tested this out and it seems reasonably fast (~0.1s per dataset) and accurate! While this will slow down our script by ~3 seconds for 25 datasets, I think that's ok. +1 to Chenyang's point about API handling (e.g. we should have default behavior if there are exceptions, e.g. log the error and return NaN as the size). Can you also add a test case for this? I can add the test in a separate PR if you're not sure how to go about doing this in our repo. I'd suggest you just test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @hitenvidhani! I'm testing this now locally but if it works, this will be good to merge 😄
@hitenvidhani Actually, it seems that something is not working correctly when I built and tested this locally. After starting dataset retrieval, I see:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the cause of the incorrectly-displayed search results. Fix that and we're good to merge
Thank you @viswavi, have pushed the requested changes, sorry about that print statement it was added for debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (
"NA"
if size_dict is {}
else "{:.2f}".format(size_dict["dataset"]["num_bytes_memory"] / 1024 / 1024)
)
There are two suggested new unit tests.
- Test when the API call fails.
- Test that the returned dict does not contain the demanded column.
Hey @zhaochenyang20 , thanks for the suggestion! One counter-suggestion, because @hitenvidhani is a first-time contributor and we've already gone back-and-forth on this PR several times, maybe we can merge the PR for now and then do a follow-up PR for unit tests? Unit tests are important, but it's also important that we welcome new contributors (and we do!) so we can make the process a little simpler this time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @zhaochenyang20's small suggestion (for the method docstring) is good. I agree with Graham that we can handle the added tests ourselves.
Once you respond to Chenyang's suggestion in the docstring, LGTM!
In this case, I gonna merge it and let us add unit tests ourselves. 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally cool to me!
Description
Dataset size on CLI by fetching size on runtime
References
#253
Blocked by
NA