Add option to request data in csv format in get_dataframe method #524

2xmm · 2020-10-17T20:18:43Z

Hi @hydrosquall,

Summary of Changes

Feature: Add option to request data in csv format in get_dataframe method potentially boosting speed up to 4-5x. ([Enhancement] Add option to request data in csv format in get_dataframe method #523)

Checklist

[ x] Code follows the style guidelines of this project
[ x] Added tests for new functionality
[ x] Update HISTORY.rst with an entry summarizing your change
[ x] Tag a project maintainer

codecov · 2020-10-17T20:21:57Z

Codecov Report

Merging #524 into master will increase coverage by 0.09%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
+ Coverage   94.64%   94.73%   +0.09%     
==========================================
  Files           5        5              
  Lines         224      228       +4     
==========================================
+ Hits          212      216       +4     
  Misses         12       12

Impacted Files	Coverage Δ
tiingo/api.py	`94.30% <100.00%> (+0.12%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82b9903...7e6af79. Read the comment docs.

hydrosquall

Hi @2xmm,

First of all, thanks for taking the time to submit this clearly documented + well-tested contribution, and improving some of the adjacent code along the way. I'm tempted to suggest making the default behavior for the library fetching JSON instead of CSV given the significant performance implications, but never know who may be depending on the presence of timezones in their datetime strings.

I have 3 followup comments for you:

Would you like to be added to the AUTHORS.rst file? You can provide your preferred name / email, whichever you're comfortable with.
To the comment about the YAML files looking different: is this referring to how the arrays all seem to be broken out onto newlines instead remaining on the line of their associated key? E.g.

Response-Type: [*]

# vs

Response-Type
  - '*"

If so, my guess would be that perhaps an text editor plugin is may be reformatting the YAML on save, depending on whether you used the helper python script or direct file edits to remove your API keys. Were there any other differences that stood out to you?

In any event, the Travis CI tests passing indicates to me that the alternate syntax is still valid YAML, so I think this is fine, but thanks for pointing this out. A future TODO for this project may be to add a consistency linter like black and run that as a pre-commit hook to remove this sort of variation.

Do you have any suggestions about where to document the presence of this new parameter? I was thinking that adding a few lines to the get_dataframe code block in README.rst could be a good starting point, indicating both the option and the performance benefit + tradeoff (e.g. no timezones in your date field). (If this feels out of scope, I'll come up with something before we cut a new version.

Once again, great work! Thanks for sharing your fix with the world, I'm excited to bring this new capability into the library.

hydrosquall · 2020-10-20T04:00:50Z

tiingo/api.py

+        prices.index = pd.to_datetime(prices.index)
+
+        # Localize to UTC to ensure equivalence between data returned in json format and
+        # csv format. Data requested in csv format does not include a timezone.


Data requested in csv format does not include a timezone

Nice catch, I was curious about whether there was still any benefit to using json format if CSV has so much better performance.

[non-blocking] @tiingo are there any other major differences between the data returned by each format that users should know about besides query performance and the presence of timezones?

I should specify more clearly that daily data returned by Tiingo does not include a timezone. For intraday data, a timezone is included with the csv response.

This timezone issue exposes a larger issue which I think we should discuss. Tiingo daily data and IEX intraday data are both being returned by the get_dataframe method but I think they would be better off with their own methods. This would follow the Tiingo API more closely and users would be more aware of the source of the underlying data. I would suggest replacing get_dataframe with get_daily_dataframe and get_iex_dataframe.

tiingo/api.py

2xmm · 2020-10-20T21:04:33Z

Responding to your questions from above:

Yes, I can add myself to AUTHORS.rst.
Yes, these were my thoughts as well. As long as all the tests are passing, the YAML format is not so important.
I’m happy to update the docs. Perhaps first we look at resolving my proposal to separate get_dataframe into multiple methods. Let me know what you think.

hydrosquall · 2020-10-21T23:16:43Z

This would follow the Tiingo API more closely and users would be more aware of the source of the underlying data. I would suggest replacing get_dataframe with get_daily_dataframe and get_iex_dataframe

I think that's a good idea! It's consistent with how we have dedicated methods for the other endpoints (e.g. crypto, news, etc), and prevents us from having to overload get_dataframe with too many new parameters as new endpoints come out. If it kept code the simple we could keep get_dataframe available (for backwards compatibility), but document these new methods instead. I will say that I think it's good that the get_ticker_price endpoint hides the user from needing to know what the endpoints are called, but that power users of this library are likely interested in knowing exactly what endpoint is being used to fulfill their request.

I also think that once these new methods are made, it may become reasonable to make "return csv format" the default instead of JSON so people get fast performance out of the box.

Would you be interested in addressing this in a follow up? If not, I'll file an issue so someone else can pick it up, but will leave off that step if you already have the context for making the change. Let me know if you have any other questions or ideas to discuss.

I'll plan to merge this branch whenever you push a commit that updates AUTHORS.rst, unless you'd like to make your changes part of this PR. Thanks again for your contributions!

2xmm · 2020-10-22T13:59:15Z

@hydrosquall

I've made the changes discussed above so this is ready for merge.

I will file a new issue with my design proposal.

hydrosquall · 2020-12-13T02:37:12Z

@2xmm This has been released to all users using version 0.13.0 as of this evening.

#545

2xmm added 2 commits October 17, 2020 16:56

Add option to request data in csv format in get_dataframe method

8e0f534

Update HISTORY.rst

df73809

hydrosquall reviewed Oct 20, 2020

View reviewed changes

2xmm added 2 commits October 22, 2020 10:44

Add name to AUTHORS.rst

bba8a59

Update comment regarding timezone inclusion for csv requests.

7e6af79

hydrosquall merged commit d4d0b27 into hydrosquall:master Oct 22, 2020

hydrosquall mentioned this pull request Oct 22, 2020

[Enhancement] Add option to request data in csv format in get_dataframe method #523

Closed

hydrosquall mentioned this pull request Nov 3, 2020

[Enhancement] Add new daily and IEX methods following the Tiingo API more closely #527

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to request data in csv format in get_dataframe method #524

Add option to request data in csv format in get_dataframe method #524

2xmm commented Oct 17, 2020

codecov bot commented Oct 17, 2020 •

edited

Loading

hydrosquall left a comment

hydrosquall Oct 20, 2020

2xmm Oct 20, 2020

2xmm commented Oct 20, 2020

hydrosquall commented Oct 21, 2020

2xmm commented Oct 22, 2020

hydrosquall commented Dec 13, 2020

Add option to request data in csv format in get_dataframe method #524

Add option to request data in csv format in get_dataframe method #524

Conversation

2xmm commented Oct 17, 2020

Summary of Changes

Checklist

codecov bot commented Oct 17, 2020 • edited Loading

Codecov Report

hydrosquall left a comment

Choose a reason for hiding this comment

hydrosquall Oct 20, 2020

Choose a reason for hiding this comment

2xmm Oct 20, 2020

Choose a reason for hiding this comment

2xmm commented Oct 20, 2020

hydrosquall commented Oct 21, 2020

2xmm commented Oct 22, 2020

hydrosquall commented Dec 13, 2020

codecov bot commented Oct 17, 2020 •

edited

Loading