Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix PixivFavoriteExtractor regex #1405

Merged
merged 2 commits into from
Mar 25, 2021

Conversation

beesdotjson
Copy link
Contributor

@beesdotjson beesdotjson commented Mar 23, 2021

Output below is the matches of each URL in a list [uid, kind, self.tag, query], based on

uid, kind, self.tag, query = match.groups()

Before:

https://www.pixiv.net/en/users/173530/bookmarks/artworks ['173530', 'bookmarks/artworks', None, None]
https://www.pixiv.net/bookmark.php?id=173530 [None, None, None, 'id=173530']
No match: https://www.pixiv.net/en/users/3137110
https://www.pixiv.net/bookmark.php?id=3137110&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1 [None, None, None,
'id=3137110&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1']
https://www.pixiv.net/bookmark.php [None, None, None, None]
https://www.pixiv.net/bookmark.php?tag=foobar [None, None, None, 'tag=foobar']
https://www.pixiv.net/en/users/173530/following ['173530', 'following', None, None]
https://www.pixiv.net/bookmark.php?id=173530&type=user [None, None, None, 'id=173530&type=user']
https://touch.pixiv.net/bookmark.php?id=173530 [None, None, None, 'id=173530']
https://touch.pixiv.net/bookmark.php [None, None, None, None]
https://www.pixiv.net/en/users/173530/bookmarks/artworks?rest=hide ['173530', 'bookmarks/artworks', None,
None]
https://www.pixiv.net/en/users/173530/bookmarks/artworks/未分類?rest=hide ['173530', 'bookmarks/artworks/>
未分類', '未分類', None]

Of note:

  • https://www.pixiv.net/en/users/173530/bookmarks/artworks?rest=hide ['173530', 'bookmarks/artworks', None, None] -> does not group "hide" option (private bookmarks)
  • https://www.pixiv.net/en/users/173530/bookmarks/artworks/未分類?rest=hide ['173530', 'bookmarks/artworks/> 未分類', '未分類', None] -> tag "未分類" included in "kind" field (i think this is undesired behavior)

After:

https://www.pixiv.net/en/users/173530/bookmarks/artworks ['173530', 'bookmarks/artworks', None, None]
https://www.pixiv.net/bookmark.php?id=173530 [None, None, None, 'id=173530']
No match: https://www.pixiv.net/en/users/3137110
https://www.pixiv.net/bookmark.php?id=3137110&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1 [None, None, None,
'id=3137110&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1']
https://www.pixiv.net/bookmark.php [None, None, None, None]
https://www.pixiv.net/bookmark.php?tag=foobar [None, None, None, 'tag=foobar']
https://www.pixiv.net/en/users/173530/following ['173530', 'following', None, None]
https://www.pixiv.net/bookmark.php?id=173530&type=user [None, None, None, 'id=173530&type=user']
https://touch.pixiv.net/bookmark.php?id=173530 [None, None, None, 'id=173530']
https://touch.pixiv.net/bookmark.php [None, None, None, None]
https://www.pixiv.net/en/users/173530/bookmarks/artworks?rest=hide ['173530', 'bookmarks/artworks', None,
'rest=hide']
https://www.pixiv.net/en/users/173530/bookmarks/artworks/未分類?rest=hide ['173530', 'bookmarks/artworks',
 '未分類', 'rest=hide']

@mikf
Copy link
Owner

mikf commented Mar 23, 2021

tag "未分類" included in "kind" field (i think this is undesired behavior)

That doesn't really matter, since kind is only used to check whether to fetch followed users or not:

if kind == "following":

Of course it'd be better if this didn't happen, but I'd take the current behavior over a regex lookbehind.

does not group "hide" option (private bookmarks)

So essentially the issue is that new-style URLs (/users/173530/bookmarks/artworks) don't include query parameters, isn't it? Then how about:

    pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:(?:en/)?"
               r"users/(\d+)/(bookmarks/artworks|following)(?:/([^/?#]+))?"
               r"|bookmark\.php)(?:\?([^#]*))?")

Fetches query parameters for old- and new-style URLs, doesn't include tags after /artworks, and doesn't use a lookbehind.

https://www.pixiv.net/en/users/173530/bookmarks/artworks ('173530', 'bookmarks/artworks', None, None)
https://www.pixiv.net/bookmark.php?id=173530 (None, None, None, 'id=173530')
https://www.pixiv.net/en/users/3137110/bookmarks/artworks/%E3%81%AF%E3%82%93%E3%82%82%E3%82%93 ('3137110', 'bookmarks/artworks', '%E3%81%AF%E3%82%93%E3%82%82%E3%82%93', None)
https://www.pixiv.net/bookmark.php?id=3137110&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1 (None, None, None, 'id=3137110&tag=%E3%81%AF%E3%82%93%E3%82%82%E3%82%93&p=1')
https://www.pixiv.net/bookmark.php (None, None, None, None)
https://www.pixiv.net/bookmark.php?tag=foobar (None, None, None, 'tag=foobar')
https://www.pixiv.net/en/users/173530/following ('173530', 'following', None, None)
https://www.pixiv.net/bookmark.php?id=173530&type=user (None, None, None, 'id=173530&type=user')
https://touch.pixiv.net/bookmark.php?id=173530 (None, None, None, 'id=173530')
https://touch.pixiv.net/bookmark.php (None, None, None, None)
https://www.pixiv.net/en/users/173530/bookmarks/artworks?rest=hide ('173530', 'bookmarks/artworks', None, 'rest=hide')
https://www.pixiv.net/en/users/173530/bookmarks/artworks/未分類?rest=hide ('173530', 'bookmarks/artworks', '未分類', 'rest=hide')

@beesdotjson
Copy link
Contributor Author

Avoiding the look behind is a good idea. I should have tinkered with my regex a little more before settling upon it.

@mikf
Copy link
Owner

mikf commented Mar 25, 2021

Thank you

@mikf mikf merged commit 5ad615f into mikf:master Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants