-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch: update crawling to not follow redirects when -disable-redirects
is set
#630
Patch: update crawling to not follow redirects when -disable-redirects
is set
#630
Conversation
…-redirects flag is enabled
…irects flag is enabled
-disable-redirects
is set-disable-redirects
is set
-disable-redirects
is set-disable-redirects
is set
Can one of the maintainers review this pull request please 🙏? |
@ErikOwen Apologies for the delay in getting back to you on this PR. The disable redirect flag was thought mostly for non-headless crawling in mind as we retain control over synchronous HTTP requests flow. |
@Mzack9999 - thank you for taking the time to look at this PR!
No, my intention is to properly prevent following all types of redirects when the Would it be fair to consider this PR as an incremental step in the right direction, since it properly disables following the most common types of redirect (HTTP status redirects which are often used to redirect HTTP requests to HTTPS), and I'll open up a new issue to track the less common redirects that you mentioned above? |
@ErikOwen I made some small change and merged the PR, thanks for it! During my tests I've anyway unfortunately seen that in many cases the redirect url is anyway visited somehow indirectly if extracted somehow during the crawler process, in fact zero-ing the effect of redirect skip. Feel free to open a new GitHub issue if blocking the other kind of redirects might be useful. Thanks! |
This fixes #610, where redirects are not followed when performing a katana crawl.
"Standard" crawl:
"Headless" crawl: