-
Notifications
You must be signed in to change notification settings - Fork 7
Daily Blink Page Layout has changed - IndexError: list index out of range #32
Comments
confirmed, having this also since... 22.05.2022, because last folder i have in my library is:
|
Jap same here. How to fix this? |
I was able to retrieve audio and text content for the free daily by calling Blinkist's API the way the frontend does. I prefer this over BeautifulSoup because it's more direct and the new DOM lacks descriptive classes/IDs. However, I haven't integrated my approach with this codebase, and I'm not sure if it works the same for arbitrary books on Blinkist Premium. If anyone's interested, I'll post my code tomorrow. :) |
Perfect, let me please know! |
Here you go. :)
import cloudscraper
from datetime import datetime
from pathlib import Path
import requests
from rich import print
from rich.progress import track
BASE_URL = 'https://www.blinkist.com/'
HEADERS = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0',
'x-requested-with': 'XMLHttpRequest',
}
LOCALES = ['en', 'de']
DOWNLOAD_DIR = Path.home() / 'Musik' / 'Blinkist'
scraper = cloudscraper.create_scraper()
def get_book_dir(book):
return DOWNLOAD_DIR / f"{datetime.today().strftime('%Y-%m-%d')} – {book['slug']}"
def get_free_daily(locale):
# see also: https://www.blinkist.com/en/content/daily
response = scraper.get(
BASE_URL + 'api/free_daily',
params={'locale': locale}
)
return response.json()
def get_chapters(book_slug):
url = f"{BASE_URL}/api/books/{book_slug}/chapters"
response = requests.get(url, headers=HEADERS)
response.raise_for_status()
return response.json()['chapters']
def get_chapter(book_id, chapter_id):
url = f"{BASE_URL}/api/books/{book_id}/chapters/{chapter_id}"
response = requests.get(url, headers=HEADERS)
response.raise_for_status()
return response.json()
def download_chapter_audio(book, chapter_data):
book_dir = get_book_dir(book)
book_dir.mkdir(exist_ok=True)
file_path = book_dir / f"chapter_{chapter_data['order_no']}.m4a"
if file_path.exists():
print(f"Skipping existing file: {file_path}")
return
assert 'm4a' in chapter_data['signed_audio_url']
response = scraper.get(chapter_data['signed_audio_url'])
assert response.status_code == 200
file_path.write_bytes(response.content)
print(f"Downloaded chapter {chapter_data['order_no']}")
for locale in LOCALES:
free_daily = get_free_daily(locale=locale)
book = free_daily['book']
print(f"Today's free daily in {locale} is: “{book['title']}”")
# list of chapters without their content
chapter_list = get_chapters(book['slug'])
# fetch chapter content
chapters = [get_chapter(book['id'], chapter['id']) for chapter in track(chapter_list, description='Fetching chapters…')]
# download audio
for chapter in track(chapters, description='Downloading audio…'):
download_chapter_audio(book, chapter)
# write markdown
# excluded for brevity – just access chapter['text'] etc.
# markdown_text = download_book_md(book, chapters) |
@NicoWeio does your code work straight out of the box, or does this to be replaced with the core.py ? |
Would this approach work on a Windows machine? |
See my earlier comment:
Assuming you have |
If |
@ptrstn Is there a fix/update coming? you said until Sunday and then you removed your answer. |
This change requires some refactoring and a little bit more time than initially expected. I'll see what I can do. Can't guarantee you when though, since I've got other things in life to take care of first. |
sure you're right about that. |
Executing this code on google colab I am getting 403 forbidden error on line 70 when calling get_chapters after troubleshooting I found that response.raise_for_status() gives that error as it can't access the url which gives this error. how can I resolve this? |
@rajeshbhavikatti I just published my code here, so we can keep this issue clean from further discussions. |
Hi Peter @ptrstn , do you have some updates on this? |
I'll be able to work on it starting beginning of October, since I'm still busy with private issues |
Any News for us? |
Hi, I have made some updates based on this repo feel free to reach out to me on any changes or update check out my notebook here |
@rajeshbhavikatti nice work, but you don't catch the mp3 files. |
@Erik262 yes, as the notion API doesn't support it yet |
The Layout and URL of the Free Daily Page has changed.
New URL: https://www.blinkist.com/en/content/daily
The locator attribute values for BeautifulSoup have to be updated accordingly. Previous values are no longer valid and cause an
IndexError
:The text was updated successfully, but these errors were encountered: