Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPError: 403 Client Error: Forbidden for url: XXX #1

Open
Erik262 opened this issue Jun 15, 2022 · 21 comments
Open

HTTPError: 403 Client Error: Forbidden for url: XXX #1

Erik262 opened this issue Jun 15, 2022 · 21 comments
Assignees

Comments

@Erik262
Copy link

Erik262 commented Jun 15, 2022

I'm getting this error message for example with this link here:
https://www.blinkist.com/api/books/the-automation-advantage-en/chapters

@NicoWeio
Copy link
Owner

Using this repo's latest code, that is? Maybe there's geoblocking at play? It works on GitHub Actions (→ https://github.com/NicoWeio/blinkist/runs/6895445085) as well as my machine, so there isn't much I can do about it. You could try logging the response text – maybe it tells you what happened.

@Erik262
Copy link
Author

Erik262 commented Jun 15, 2022

I'm on the latest one I pulled about 15 min ago:
The complete Error:
Traceback (most recent call last):
File "/Users/erik/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 972, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/erik/Downloads/blinkist-main/main.py", line 66, in
free_daily = get_free_daily(locale=locale)
File "/Users/erik/Downloads/blinkist-main/main.py", line 32, in get_free_daily
return response.json()
File "/Users/erik/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 976, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@NicoWeio
Copy link
Owner

Well, you don't get a JSON response. Try logging response.text instead of response.json().

@Erik262
Copy link
Author

Erik262 commented Jun 16, 2022

Okay. I tried to validate the response.text string and figured out that there is an error in the json file:
But I don't know how to fix this, since the response call function should to the trick, but is somehow broken.

This is what I get back as response.text

{"book":{"id":"628df2186cee0700084919a6","kind":"book","slug":"goals-based-investing-en","title":"Goals-based Investing","subtitle":"A Visionary Framework for Wealth Management","subtitleHtmlSafe":"A Visionary Framework for Wealth Management","aboutTheBook":"\u003cp\u003e\u003cem\u003eGoals-Based Investing \u003c/em\u003e(2022) explains how the wealth management industry is transforming, how modern portfolio theory is no longer considered modern, and how product evolution and regulatory changes are making it easier for investors and advisors to access market segments that were once the exclusive domain of large institutes.\u003c/p\u003e","buyOnAmazonUrl":"/en/books/goals-based-investing-en/purchase","author":"Tony Davidow","truncatedAuthor":"Tony Davidow","sourceAuthor":"Tony Davidow","u rl":"https://www.blinkist.com/en/books/goals-based-investing-en","browseUrl":"/en/nc/browse/books/goals-based-investing-en","previewUrl":"/en/books/goals-based-investing-en","read ingDuration":25,"minutesToRead":25,"isAudio":true,"readCount":null,"image":{"default":{"src":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/470.jpg","srcset ":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/640.jpg"}},"sources":[{"media":"xs","src":"https://images.blinkist.io/images/books/628df2186cee070008 4919a6/1_1/470.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/640.jpg"}},{"media":"s","src":"https://images.blinkist.io/images/books/628 df2186cee0700084919a6/1_1/640.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/1080.jpg"}},{"media":"m","src":"https://images.blinkist.io/ images/books/628df2186cee0700084919a6/1_1/250.jpg","srcset":{"2x":"https://images.blinkist.io/images/books/628df2186cee0700084919a6/1_1/470.jpg"}}]},"audioUrl":"https://hls.blinki st.io/bibs/628df2186cee0700084919a6/628df2196cee0700084919a8-T1653981795.m4a?Expires=1655355345\u0026Signature=G7saoNJx1hZYFnZaj~X2dE0tyJAGg4GUEDTawW4Nuh13qiuHy6maJGjk1agHKo2p9qt7 erLaSOncPXzVErJa2tenzR7qokLK~LZf9QEaRr5bLagkkSAK8SI9TpDw9R6yP6luOlOKzhXO~orkpPzH9Xui5VkOcB5j9VmkxC-pGxEkVoGwOE~ArQuCHNvoyFFLsaadSAKAV2nQV2Jf~280yqO0I7a-rgOwlATQznsB301gQPP9CT56fun nb1GNjCu3cspv~nLcgYUrQkZyT2o72-lOdG8ssl2D1YOcPt0bYNwMnnCvyr99pMor4reyaX2RKF41n-VAf8p2Tu~ZUzkFDA__\u0026Key-Pair-Id=APKAJXJM6BB7FFZXUB4A","chaptersLength":7,"hasAudio":true,"langua ge":"en","freeDaily":null,"category":{"title":"Money \u0026 Investments","sprite":"money-and-investments","slug":"money-and-investments-en"},"averageRating":3.6,"categories":[{"id ":"54788fef6439320008240000","url":"/en/nc/categories/money-and-investments-en","sprite":"money-and-investments","slug":"money-and-investments-en","title":"Money \u0026 Investments","subtitle":"You work hard for your money, right? Let the experts show you how to make it work hard for you."}]},"endTimestamp":1655416799}

@NicoWeio
Copy link
Owner

That's curious. I don't see an error in the JSON you posted, that is, https://jsonformatter.curiousconcept.com/ doesn't report one. I assume the spaces in e.g. "u rl" were a result of copy-pasting the data? Try using triple backticks for that.
Other than that, my best guess is that the latter response is actually fine, and you just had bad luck. I'll implement better error handling, so we can see what's going on.

@NicoWeio
Copy link
Owner

Turns out this is Cloudflare. I assumed that cloudscraper would raise CloudflareChallengeError automatically, but that's not the case. In @ptrstn's version, this is done in _get_daily_blink_info(self, language="en").
I'll add some retry magic.

@NicoWeio
Copy link
Owner

Alright then, please try again with the latest main.py (notice the new requirement tenacity). :)

@Erik262
Copy link
Author

Erik262 commented Jun 16, 2022

@NicoWeio
Tried and then this error came up:
It seems to work for the first second (could see the downloading bar), then when it started downloading audio files it stopped working and since then I don't see the download bar anymore.

`Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/Users/erik/Downloads/blinkist-main/main.py", line 38, in _api_request
    raise cloudscraper.exceptions.CloudflareChallengeError()
cloudscraper.exceptions.CloudflareChallengeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/erik/Downloads/blinkist-main/main.py", line 77, in <module>
    free_daily = get_free_daily(locale=locale)
  File "/Users/erik/Downloads/blinkist-main/main.py", line 50, in get_free_daily
    return _api_request('free_daily', params={'locale': locale})
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 361, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x101ed7cd0 state=finished raised CloudflareChallengeError>]`

BUT wait!! after doing some "definition of insanity" it started working just by running it again a few times even when error come up. Interessting.

@NicoWeio
Copy link
Owner

Thanks for your feedback! I actually forgot adding the retry logic to audio downloads. That should be fixed now, so you don't have to “definition of insanity” yourself. ;)

@NicoWeio
Copy link
Owner

Hey there, does this kind of error still occur with the latest version of my code?

@Erik262
Copy link
Author

Erik262 commented Jul 27, 2022

yes, then I gave up testing xD

@2600box
Copy link

2600box commented Aug 1, 2022

In my testing, mostly your code works great and I am grateful, but for some reason this particular book throws the same error:

ubuntu:~/blinkist# ./main.py --book-slug the-7-habits-of-highly-effective-people-en ./test/
Book (1/1): “The 7 Habits of Highly Effective People”
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Retrying in 2.0 seconds…
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--Error downloading „The 7 Habits of Highly Effective People“ – renaming output directory.
Traceback (most recent call last):
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/home/ubuntu/blinkist/blinkist/common.py", line 27, in request
    raise cloudscraper.exceptions.CloudflareChallengeError()
cloudscraper.exceptions.CloudflareChallengeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/blinkist/./main.py", line 132, in <module>
    main()
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/blinkist/./main.py", line 124, in main
    download_book(
  File "/home/ubuntu/blinkist/./main.py", line 43, in download_book
    _ = book.chapters
  File "/usr/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/home/ubuntu/blinkist/blinkist/book.py", line 54, in chapters
    chapters = [
  File "/home/ubuntu/blinkist/blinkist/book.py", line 55, in <listcomp>
    Chapter.from_id(self, chapter['id'])
  File "/home/ubuntu/blinkist/blinkist/chapter.py", line 16, in from_id
    chapter_data = api_request_web(f'books/{book.id}/chapters/{chapter_id}')
  File "/home/ubuntu/blinkist/blinkist/common.py", line 49, in api_request_web
    return api_request('https://blinkist.com/api/', endpoint, params=params)
  File "/home/ubuntu/blinkist/blinkist/common.py", line 40, in api_request
    response = request(url, params=params, headers=HEADERS)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/ubuntu/blinkist/lib/python3.10/site-packages/tenacity/__init__.py", line 361, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f7cd59652a0 state=finished raised CloudflareChallengeError>]
Fetching chapters… ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--

@NicoWeio
Copy link
Owner

NicoWeio commented Nov 16, 2022

Although I have no fix yet, I found some more books that reliably trigger 403 errors:

Maybe there is a pattern?

@2600box
Copy link

2600box commented Jan 14, 2023

@Erik262 @NicoWeio I think I have identified some more giving cloudflare errors, if you would like to test?

the-leaders-guide-to-unconscious-bias-en
the-8th-habit-en
the-speed-of-trust-en
building-a-second-brain-en
everyone-deserves-a-great-manager-en
first-things-first-en
121-first-dates-en
a-beautiful-mind-en
the-7-habits-of-highly-effective-people-en

@NicoWeio
Copy link
Owner

NicoWeio commented Jan 16, 2023

Thanks, @2600box!
I investigated this some more and found out two things:

1.

In the web app, one can see that the request goes to the expected URL and works, contrary to a request to the same URL by this code.
Compared to requests for a book that can be downloaded with this program, some request headers differ (direction of comparison: working → not working).

  • (…)
  • In particular, Sec-Fetch-Mode: no-corsSec-Fetch-Mode: cors.

2.

Providing a valid _blinkist-webapp_session cookie fixes our problems. I verified this for all of the links in your comment above.

Therefore…

If I don't find an alternative, I will add an option to provide this or to automatically extract it from Firefox in the near future.

NicoWeio added a commit that referenced this issue Jan 18, 2023
This works, but is not yet configurable,
i.e. nothing will work without a valid cookie.
A first step towards fixing #1.
@NicoWeio NicoWeio self-assigned this Jan 18, 2023
NicoWeio added a commit that referenced this issue Jan 18, 2023
This works, but is not yet configurable,
i.e. nothing will work without a valid cookie.
A first step towards fixing #1.
@2600box
Copy link

2600box commented Jan 19, 2023

Thanks for working on this. I tested the new branch with my cookies.sqlite and it worked well.

Ideally being able to specify cookies.txt file would be ideal.

I also noticed you added the "This book has no audio." which is great.

Thanks for continuing this project!

@NicoWeio
Copy link
Owner

You're very welcome!

Can you elaborate on why a cookies.txt file would be helpful to you? Wouldn't auto-import from all major browsers (to be done) be more comfortable? Of course I could implement both, I just don't see the use case.

@2600box
Copy link

2600box commented Jan 19, 2023

You're very welcome!

Can you elaborate on why a cookies.txt file would be helpful to you? Wouldn't auto-import from all major browsers (to be done) be more comfortable? Of course I could implement both, I just don't see the use case.

Sure. First, to me it is a more standard approach. Secondly, it is because I prefer to export the cookie for blinkest individually and third I don't run this on the same machine that has my browser.

NicoWeio added a commit that referenced this issue Oct 15, 2023
This works, but is not yet configurable,
i.e. nothing will work without a valid cookie.
A first step towards fixing #1.
NicoWeio added a commit that referenced this issue Oct 15, 2023
This works, but is not yet configurable,
i.e. nothing will work without a valid cookie.
A first step towards fixing #1.
phuongnd08 pushed a commit to phuongnd08/blinkist that referenced this issue Feb 14, 2024
This works, but is not yet configurable,
i.e. nothing will work without a valid cookie.
A first step towards fixing NicoWeio#1.
@phuongnd08
Copy link

I cherry-pick 95d9367 and it works great. If you don't have time for a thorough fix I would recommend push 95d9367 to master and tell folks to use Firefox to login first before using the tool.

@NicoWeio
Copy link
Owner

That's a good idea. The only reason I didn't to it yet is because support for other browsers seemed so close… and then I never got around to it. As I just wrote in another issue, I hope to get back to this in a month or so.

@phuongnd08
Copy link

I think most of the users are "life-hackers" anyway, they won't mind using Firefox just so that the tool works like a breeze :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants