-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to request data in csv format in get_dataframe method #524
Add option to request data in csv format in get_dataframe method #524
Conversation
Codecov Report
@@ Coverage Diff @@
## master #524 +/- ##
==========================================
+ Coverage 94.64% 94.73% +0.09%
==========================================
Files 5 5
Lines 224 228 +4
==========================================
+ Hits 212 216 +4
Misses 12 12
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @2xmm,
First of all, thanks for taking the time to submit this clearly documented + well-tested contribution, and improving some of the adjacent code along the way. I'm tempted to suggest making the default behavior for the library fetching JSON instead of CSV given the significant performance implications, but never know who may be depending on the presence of timezones in their datetime strings.
I have 3 followup comments for you:
- Would you like to be added to the AUTHORS.rst file? You can provide your preferred name / email, whichever you're comfortable with.
- To the comment about the YAML files looking different: is this referring to how the arrays all seem to be broken out onto newlines instead remaining on the line of their associated key? E.g.
Response-Type: [*]
# vs
Response-Type
- '*"
If so, my guess would be that perhaps an text editor plugin is may be reformatting the YAML on save, depending on whether you used the helper python script or direct file edits to remove your API keys. Were there any other differences that stood out to you?
In any event, the Travis CI tests passing indicates to me that the alternate syntax is still valid YAML, so I think this is fine, but thanks for pointing this out. A future TODO for this project may be to add a consistency linter like black
and run that as a pre-commit hook to remove this sort of variation.
- Do you have any suggestions about where to document the presence of this new parameter? I was thinking that adding a few lines to the
get_dataframe
code block inREADME.rst
could be a good starting point, indicating both the option and the performance benefit + tradeoff (e.g. no timezones in your date field). (If this feels out of scope, I'll come up with something before we cut a new version.
Once again, great work! Thanks for sharing your fix with the world, I'm excited to bring this new capability into the library.
tiingo/api.py
Outdated
prices.index = pd.to_datetime(prices.index) | ||
|
||
# Localize to UTC to ensure equivalence between data returned in json format and | ||
# csv format. Data requested in csv format does not include a timezone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data requested in csv format does not include a timezone
Nice catch, I was curious about whether there was still any benefit to using json
format if CSV has so much better performance.
[non-blocking] @tiingo are there any other major differences between the data returned by each format that users should know about besides query performance and the presence of timezones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should specify more clearly that daily data returned by Tiingo does not include a timezone. For intraday data, a timezone is included with the csv response.
This timezone issue exposes a larger issue which I think we should discuss. Tiingo daily data and IEX intraday data are both being returned by the get_dataframe
method but I think they would be better off with their own methods. This would follow the Tiingo API more closely and users would be more aware of the source of the underlying data. I would suggest replacing get_dataframe
with get_daily_dataframe
and get_iex_dataframe
.
Responding to your questions from above:
|
I think that's a good idea! It's consistent with how we have dedicated methods for the other endpoints (e.g. crypto, news, etc), and prevents us from having to overload get_dataframe with too many new parameters as new endpoints come out. If it kept code the simple we could keep I also think that once these new methods are made, it may become reasonable to make "return csv format" the default instead of JSON so people get fast performance out of the box. Would you be interested in addressing this in a follow up? If not, I'll file an issue so someone else can pick it up, but will leave off that step if you already have the context for making the change. Let me know if you have any other questions or ideas to discuss. I'll plan to merge this branch whenever you push a commit that updates AUTHORS.rst, unless you'd like to make your changes part of this PR. Thanks again for your contributions! |
I've made the changes discussed above so this is ready for merge. I will file a new issue with my design proposal. |
Hi @hydrosquall,
Summary of Changes
Checklist
HISTORY.rst
with an entry summarizing your change