Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copr seems to return stale pages for the builds in some cases #2625

Open
praiskup opened this issue Apr 3, 2023 · 2 comments
Open

Copr seems to return stale pages for the builds in some cases #2625

praiskup opened this issue Apr 3, 2023 · 2 comments
Labels

Comments

@praiskup
Copy link
Member

praiskup commented Apr 3, 2023

Original issue: https://bugzilla.redhat.com/show_bug.cgi?id=2084511
Opened: 2022-05-12 10:22:58
Opened by: Lev Veyde


Lev Veyde commented at 2022-05-12 10:22:58:

Description of problem:

It was incidentally discovered that if a request is made, with Python requests package, for the build directory before the builds completes, then afterwards the server will continue to return only that cached version (for up to 24 hours).

Further debugging showed that only affected fetching the information from that particular host, and only when using the Python's requests library - requesting the same URL on the same host using other tools i.e. wget or curl returns the correct, up to date file.

Some further debugging showed that the issue only appears when the 'Accept-Encoding: gzip, deflate' header is added to the request i.e. :

curl -L -v https://download.copr.fedorainfracloud.org/results/ovirt/ovirt-master-snapshot/centos-stream-8-x86_64/04395856-ovirt-engine -H 'Accept-Encoding: gzip, deflate'

would return the stale version of the page, while this:

curl -L -v https://download.copr.fedorainfracloud.org/results/ovirt/ovirt-master-snapshot/centos-stream-8-x86_64/04395856-ovirt-engine

would return an up-to date one.

Trying to add cache control headers to the request had no effect, i.e. :

import requests
headers = { "Cache-Control": "no-cache", "Pragma": "no-cache", }
data = requests.get('https://download.copr.fedorainfracloud.org/results/ovirt/ovirt-master-snapshot/centos-stream-8-x86_64/04395856-ovirt-engine', headers=headers)
data.headers
{'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '6629', 'Connection': 'keep-alive', 'X-Powered-By': 'PHP/8.0.17', 'Accept-Ranges': 'bytes', 'Date': 'Wed, 11 May 2022 09:20:10 GMT', 'Server': 'lighttpd/1.4.64', 'X-Cache': 'Hit from cloudfront', 'Via': '1.1 e0a78b49206aba2a7e76eb45b9688a8e.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD89-P2', 'X-Amz-Cf-Id': 'W0uQDQ7JeIkfu5IFxpoeoSkp_9r5COnuLeSWc8A8jy-9Z4pVp5bnlg==', 'Age': '50252'}

would return the old cached copy.

Not sure if the issue is due to some configuration on the Copr side, or it's an issue with Amazon CloudFront service, that improperly handles the case of the compressed cached results.

Also not sure at this stage if the result will be returned just for the same host that made the request, or everyone who falls on that particular server that will serve the cached result from the AWS's CloudFront.

Version-Release number of selected component (if applicable):

How reproducible:
not sure at this stage

Steps to Reproduce:

  1. start a long build
  2. request the build artifacts page using Python's requests library while it's still building i.e.:

import requests
data = requests.get('https://download.copr.fedorainfracloud.org/results/')

check what you get in data.headers and data.text

  1. wait for the build to finish and request the page again using Python's requests from the same host

Actual results:
The returned page will be in an out of date state

Expected results:
The returned page should be updated one, just as an uncompressed one.

Additional info:


Miroslav Suchý commented at 2022-05-12 11:03:58:

seems like https://stackoverflow.com/questions/18774069/amazon-cloudfront-cache-control-no-cache-header-has-no-effect-after-24-hours
but I do not see way how to configure ttl for cloudfront


Lev Veyde commented at 2022-05-12 11:10:03:

(In reply to Miroslav Suchý from comment #1)

seems like
https://stackoverflow.com/questions/18774069/amazon-cloudfront-cache-control-
no-cache-header-has-no-effect-after-24-hours
but I do not see way how to configure ttl for cloudfront

Only in this case the issue is only with the compressed data, thought for complete test we'll need to test making requests both compressed and uncompressed, and see if there is any difference in handling.

@praiskup
Copy link
Member Author

praiskup commented Apr 4, 2023

Does it mean that this needs to be fixed in AWS CloudFront, or can we fix it by configuration?

@praiskup praiskup moved this from Needs triage to In 2 years in CPT Kanban Apr 4, 2023
@gstrauss
Copy link

#2011 dates back to https://pagure.io/copr/copr/issue/2001 over 16 months ago and contains numerous suggestions for improving the service, along with step-wise incremental code steps to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In 2 years
Development

No branches or pull requests

2 participants