-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Error 401 when called on large amount of tickers #360
Comments
Is a fix planned for tq_get? getSymbols works with many tickers because of the 1 second pause. Thanks. |
The limits do not seem to be too restrictive. After reaching the 404 error, I was able to get successfull api calls in a few minutes. After that, I downloaded all sp500 stocks (2010-today) in a single call: If anyone can confirm this in your own R session, please do. It looks as the restrictions are based on time between api calls for the same ip. This could invalidate any parallel computation, which is what I'm testing now. |
As expected, any parallel use of quantmod::getSymbol() reaches the limit very easilly. As such, I'm removing the parallel option from yfR and BatchGetSymbols. When using a single session (non-parallel), yfR runs fine for any large sample of stocks. |
If you can, please confirm if the code below runs fine:
|
The code runs fine, with a caveat: The issue really seems to be the amount of API calls. Even though they're not in parallel, they are still (500+) sequential calls to the Yahoo! Finance API, and as such, it is quite inconsistent whether or not the whole dataframe will be downloaded. I believe this has already been worked around with your implementation of a cache system. Thus, if it hasn't been downloaded completely at once, I suggest users wait for a few minutes, then run the code again, until completion. Either way, thank you, @msperlin! |
Yes, lack of consistency in equivalent calls to BatchGetSymbols/yf_get can be troublesome. I'll see how I can control for this, at least letting the user know about the 404 error. |
There seems to be a rate limit for the number of tickers you can request via the CSV endpoint. The yfinance python library [1] uses the JSON endpoint and doesn't seem to have rate limit issues. [1] https://github.com/ranaroussi/yfinance Closes #362. See #360.
I just added an option to use the JSON endpoint instead of the CSV endpoint. Can you try that and see if you still get the 401 responses? You can install the patch via: |
Sure, let me try.. |
Changed the call to getSymbols and I tried my best to reach the 401 response, with not success. I ranned yfR with parallel execution (14 cores) and it worked as expected. |
looks good.. |
anyone can test it here:
@joshuaulrich please let me know if and when you're incorporating these changes.. I'll wait for your update in CRAN. thanks. |
I'm considering whether or not to make the JSON endpoint the default for |
Being honest, I'm not sure. I'll have to think about that. Quality wise, I suspect the YF data comes from the same source and, wheter it is json or csv, the output should be the same. But, the csv entrypoint is restricted by IP, which forces user to behave better, which is good. The restriction is also not that bad (I can still download everything I need for my classes, for example). While I would prefer to allow parallel computing with yfR, I also know that we should be thankful to YF for still keeping the API open... what do you think? |
some thoughts: since you've done the bulk of the work of moving to the V8 api, may as well loosen the validation on period to support intra-day and kill #351 also, why not just remove the v7 code path entirely. in principle, i think supporting code to work around throttling (if thats what yahoo is doing) is not really a worth while battle. moving to higher rev makes sense to me, but if yahoo is really throttling and serious about it, its going to comer up again sooner or later. i'd just see this as another notch in the growing list of issues w/ yahoo data in general |
I would hope so, but I wouldn't be surprised if there are some differences... because data is awful. ;)
I'm thinking the same thing. Thanks for mentioning the intra-day issue. That's a great point. |
### Changes in 0.4.22 (2023-04-05) 1. Move jsonlite from Suggests to Imports so it doesn't cause a problem when a package that doesn't also Suggest jsonlite uses getSymbols(). Thanks to Kurt Hornik for the report and fix! [#380](joshuaulrich/quantmod#380) ### Changes in 0.4.21 (2023-03-29) 1. Fix S3 method issues. R-devel (83995-ish) added a check for possible S3 method issues. Register methods it found that were not registered: `str.replot()`, `seriesHi.timeSeries()`, and `seriesLo.timeSeries()`. It was also confused by `range.bars()` and `unique.formula.names()`. Remove `unique.formula.names()` because it wasn't exported or used internally. Rename `range.bars()` to `rangeBars()`, which isn't exported. Thanks to Kurt Hornik for the report! [#375](joshuaulrich/quantmod#375) 1. Remove "^" prefix from `getSymbols()` return value. When the 'Symbols' argument has a "^" prefix and `auto.assign = TRUE`: * `getSymbols()` removes the "^" from the object it creates, but * returns the 'Symbols' argument unchanged, and * removes the "^" from the column names of the object it creates. The example below will create an object named `IXIC` but the value of `sym` will be "^IXIC". sym <- getSymbols("^IXIC") That means `x <- get(sym)` will not work because an object named `^IXIC` doesn't exist. [#371](joshuaulrich/quantmod#371) 1. Add 'from' and 'to' arguments to `getSymbols.FRED()`. Users expect to be able to set the 'from' and 'to' arguments for FRED data like they can for Yahoo data. Those values were ignored and the entire series was always returned. [#368](joshuaulrich/quantmod#368) 1. Change interval to 1d for `getDividends()` and `getSplits()`. The "3mo" setting caused some dividends to be missing for companies that issued monthly dividends. Note that the response to this request also includes all the OHLCV data. But it's small (less than 1MB for 60+ years of daily data). [#372](joshuaulrich/quantmod#372) 1. Handle errors in `getSplits()` and `getDividends()`. `getDividends()` didn't handle cases where the download failed, or when dividends needed to be split-adjusted but there were no splits. It also tried to set colnames on the empty xts object that's returned when there are no dividends. `getSplits()` had the same colnames issue. Check for no splits by testing for `NULL` because that's more explicit. Thanks to Chris Cheung for the report! [#366](joshuaulrich/quantmod#366) 1. Export `HL()`, `is.HL()`, and `has.HL()` functions and add documentation. These were added in 0.4.18 but not exported or included in the documentation. 1. Use Yahoo Finance v8 JSON endpoint and remove the v7 CSV endpoint. There seems to be a rate limit for the number of tickers you can request via the CSV endpoint. The [yfinance python library](https://github.com/ranaroussi/yfinance) uses the JSON endpoint and doesn't seem to have rate limit issues. [#360](joshuaulrich/quantmod#360) [#362](joshuaulrich/quantmod#362) [#364](joshuaulrich/quantmod#364)
Hi, Thank you very much for the fix and comments here.
getSymbols.yahoo()
now works for me, but I have a different problem. When I rungetSymbols.yahoo()
successfully in a loop for more than about 300-400 tickers, I started to get "HTTP error 401" for all following downloads. There are some failed downloads of invalid tickers in between though. Does anyone know what the issue is? Maybe I have the same problem with @rhamo.Here is an example of the subsequent download:
Thank you!
Originally posted by @edwinhung in #358 (comment)
Thank you all for the quick fix in regards to
tq_get()
! However, I believe that this new issue has arisen.The text was updated successfully, but these errors were encountered: