-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robust checker of whether a URI is live or not #50
Comments
if you cannot do this, @andrew2net, we can give this to @alexeymorozov . As long as it's not me :) |
@opoudjis it would be helpful if you give this to someone else. |
Fair. @alexeymorozov ? |
There are many things to fix here:
|
Already being done
Fine.
And like I said, HTTP 403 Forbidden has to be assumed to be a valid URL.
It's a best effort check. The really authoritative way of doing this is for the author to insert a manual accessed date, to indicate that they have physically sighted it. |
No developer is currently working on this. @ronaldtse This needs to be addressed. |
As a result of metanorma/metanorma-iso#1114, I have enabled code that was previously deactivated, to check whether a URI in a bibliographic entry is active or not. This is done in case the bibliography requires a date last accessed to be supplied, and it hasn't been already.
https://github.com/relaton/relaton-render/blob/main/lib/relaton/render/general/uri.rb
The problem is, it isn't working well, and it needs someone who understands fetching better than me to fix it.
For example:
seems to be in an infinite loop of redirections triggered by
https://dl.acm.org/doi/10.1145/3425898.3426958
It is returning HTTP 302 Found, but it is a redirection. The problem is, it's a redirection to a cookie query, https://dl.acm.org/doi/10.1145/3425898.3426958?cookieSet=1, and that ends up in an infinite loop. Clearly
res.is_a?(Net::HTTPRedirection)
is naive, but TBH I don't have the headspace to make this robust.PDFs are routinely returning false on
res.code[0] != "4"
; sohttp://www.tandfonline.com/doi/abs/10.1111/j.1467-8306.2004.09402005.x is returning HTTP 301 Moved Permanently, which really is a redirect, and its res["location"] is still https://www.tandfonline.com/doi/abs/10.1111/j.1467-8306.2004.09402005.x. When I access that, I get HTTP 403 Forbidden. But I expect to get HTTP 403 for a paywalled resource! The gem should not be reporting a failure there.
So this needs a smarter treatment of possible HTTP codes. Really, the only case where a URI is invalid is (I think) 404 or 50x. But I don't want to do this, I want someone else to do this, that is familiar with HTTP codes and paywalled content and redirects.
I do not agree with Ronald that a new gem is required, but I'm asking that someone else handles this. For now, I'm doing a hotfix that passes all URIs it sees.
The text was updated successfully, but these errors were encountered: