Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeGate is not reporting the HTTP status correctly #58

Closed
anjackson opened this issue Mar 2, 2020 · 3 comments · Fixed by webrecorder/pywb#564
Closed

TimeGate is not reporting the HTTP status correctly #58

anjackson opened this issue Mar 2, 2020 · 3 comments · Fixed by webrecorder/pywb#564
Assignees

Comments

@anjackson
Copy link
Contributor

This is related to #44, #45 but comments there indicate that the TimeGate used to report the status correctly. This seems not to be the case now:

# curl -I https://beta.webarchive.org.uk/wayback/archive/http://www.sahahdsf
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Mon, 02 Mar 2020 16:10:51 GMT
Content-Type: text/html
Content-Length: 3065
Connection: keep-alive
Link: <http://www.sahahdsf>; rel="original", <https://beta.webarchive.org.uk/wayback/archive/http://www.sahahdsf>; rel="timegate", <https://beta.webarchive.org.uk/wayback/archive/timemap/link/http://www.sahahdsf>; rel="timemap"; type="application/link-format"
Vary: accept-datetime
Accept-Ranges: bytes

This causes problems with the Memento Aggregator. I think even if we skip it everywhere else, the TimeGate URI should do a lookup and return the right status code.

@ikreymer
Copy link
Contributor

ikreymer commented Jun 8, 2020

Currently, the timegate should work at the mp_/ url, eg:

curl -I https://beta.webarchive.org.uk/wayback/archive/mp_/https://webarchive.org.uk/
...
Memento-Datetime: Tue, 12 May 2020 15:00:10 GMT
Link: <https://www.webarchive.org.uk/>; rel="original", <https://beta.webarchive.org.uk/wayback/archive/https://www.webarchive.org.uk/>; rel="timegate", <https://beta.webarchive.org.uk/wayback/archive/timemap/link/https://www.webarchive.org.uk/>; rel="timemap"; type="application/link-format", <https://beta.webarchive.org.uk/wayback/archive/20200512150010mp_/https://www.webarchive.org.uk/>; rel="memento"; datetime="Tue, 12 May 2020 15:00:10 GMT"; collection="archive"
Preference-Applied: rewritten
Vary: accept-datetime, Prefer

However, there is definitely a bug in that it returns the url without mp_/ in ref="timegate".
Would it be better to

  1. Fix it so that it returns:
<https://beta.webarchive.org.uk/wayback/archive/mp_/https://www.webarchive.org.uk/>; rel="timegate"

or

  1. Make it so the Timegate also works w/o the mp_/ modifier. I suppose this may be clearer, but less consistent as this URL is used for the outer frame, and not for a memento. I suppose in TimeGate query the response is discarded anyway as memento only looks at the headers, so it could still return timegate information with this url, even though the response is the top-frame content.

@anjackson
Copy link
Contributor Author

The problem with using the mp_/ modifier is that is represents a breaking change of our TimeGate API for all clients (a basic assumption of our work moving from OWB was that switching to pywb should be API compatible). We use the API in various places, as well as it being used by third-parties, and this breaking change has caused a range of problems with both internal and external systems.

This is why I would prefer it if the TimeGate URL https://www.webarchive.org.uk/wayback/archive/<QUERY_URL> behaved in the same way as the mp_/ version.

ikreymer added a commit to webrecorder/pywb that referenced this issue Jun 8, 2020
…ectly per-memento spec,

return 404 if not found, return latest memento header. do this by performing actual response lookup,
but then returning the top frame response if succeeded. addresses ukwa/ukwa-pywb#58
ikreymer added a commit to webrecorder/pywb that referenced this issue Jun 8, 2020
…ectly per-memento spec, (#564)

return 404 if not found, return latest memento header. do this by performing actual response lookup,
but then returning the top frame response if succeeded. addresses ukwa/ukwa-pywb#58
@ikreymer
Copy link
Contributor

ikreymer commented Jun 8, 2020

The expected timegate behavior should now work w/o the mp_ modifier. This is also part of the just released pywb 2.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants