Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove vsis3 test order dependency #4444

Merged
merged 6 commits into from
Sep 9, 2021

Conversation

DFEvans
Copy link
Contributor

@DFEvans DFEvans commented Sep 7, 2021

What does this PR do?

Refactor gcore/vsis3.py to remove dependency on test order.

Refactor to apply environment variable overrides only during the test using now-standard test fixtures, rather than altering the global GDAL environment.

I apologise in advance that there are a ridiculous number of changes to appease flake8 - in retrospect, I wish I had done these first. If it's simply too much (and I suspect it is), options are:

  • Merge without the linting changes, which are in the final commit, and submit a separate PR for them (will the build pipeline permit this?)
  • Tell me to go away and lint the whole file first in one PR, and then do the dependency changes

What are related issues/pull requests?

#4407

Tasklist

  • Refactor code
  • Linting for flake8
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

@rouault
Copy link
Member

rouault commented Sep 7, 2021

I can see some slowdown in the fact of starting and stopping the webserver for each test case: ~ 21.5 s now vs ~ 12.5 before. Couldn't the webserver be started once for all in a @pytest.fixture(autouse=True, scope='module') function ?

@DFEvans
Copy link
Contributor Author

DFEvans commented Sep 7, 2021

I'll have a look - I think I initially had that, undid it while working through the various issues, but those were mostly (entirely?) down to how the environment variables were being set.

@DFEvans
Copy link
Contributor Author

DFEvans commented Sep 8, 2021

The runtimes should now be back to what they were before.

Looking at the test runtimes, I'm surprised that a couple of the local webserver tests are among the slowest. For example, test_vsis3_random_write takes over a second to run, with almost all of that time going into the call to gdal.VSIFCloseL. Is there a simple change that can be made to improve that?

@rouault
Copy link
Member

rouault commented Sep 8, 2021

For example, test_vsis3_random_write takes over a second to run, with almost all of that time going into the call to gdal.VSIFCloseL. Is there a simple change that can be made to improve that?

ah, setting the CPL_CURL_VERBOSE=YES environment variable, I see the following traces

> PUT /random_write/test.bin HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*
x-amz-date: 20150101T000000Z
x-amz-content-sha256: 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
Authorization: AWS4-HMAC-SHA256 Credential=AWS_ACCESS_KEY_ID/20150101/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=ea1dcde6e84df1f5c722685ab953c475c44a85b6968c7d4d1aee11ec4bc4888c
Expect: 100-continue
Content-Length: 3

* Done waiting for 100-continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.6 Python/3.8.10
< Date: Wed, 08 Sep 2021 08:28:52 GMT
< Content-Length: 0
< 
* Closing connection 0

So we send a PUT request with a Expect: 100-continue header, but our server emulation doesn't emit it and just emits the HTTP/1.0 200 OK. The "Done waiting for 100-continue" must be a 1 s delay in curl in that situation. Probably that improving webserver.py/_process_req_resp() to emit the HTTP/1.1 100 Continue would fix this. I see in test_vsis3_4() that we have a custom handler function that does this.

@DFEvans
Copy link
Contributor Author

DFEvans commented Sep 8, 2021

The Appveyor build failure has me confused. I can see it's failing on some tests modified by this PR, but I don't know what the important difference between the failing Appveyor build and the passing Github Windows builds is likely to be.

@rouault
Copy link
Member

rouault commented Sep 8, 2021

The Appveyor build failure has me confused. I can see it's failing on some tests modified by this PR, but I don't know what the important difference between the failing Appveyor build and the passing Github Windows builds is likely to be.

I don't know either. I don't recall having seen flakes in those tests before (we have sometimes flakes on test_vsis3_sync_multithreaded_download or test_vsis3_sync_multithreaded_download_chunk_size)

@rouault
Copy link
Member

rouault commented Sep 8, 2021

I've restarted the job and it still fails. One hypothesis would be that you modified the formatting of the XML response to be a multiline string, and perhaps git clone on that config uses CR-LF when checking it out files ? But there are other similar instances in other tests and they don't fail

@DFEvans
Copy link
Contributor Author

DFEvans commented Sep 8, 2021

I've pushed a change to test_vsis3_write_multipart_retry to change the XML to avoid newlines - we'll see if that changes it.

I had thought the Appveyor builds only started failing after the HTTP 100 change I made in webserver.py, so my next option is to try undoing that and see if it builds. The change I made could result in tests receiving HTTP 100 when they didn't before (if GDAL is sending Expect: 100-continue in the header), and perhaps there's a chance that it's revealed a very odd corner case.

@rouault
Copy link
Member

rouault commented Sep 8, 2021

The change I made could result in tests receiving HTTP 100 when they didn't before (if GDAL is sending Expect: 100-continue in the header), and perhaps there's a chance that it's revealed a very odd corner case.

yeah, that's a more likely cause. Actually I see that in master when we emit manually 100 continue, we do it in HTTP/1.1. But by default the webserver likely use HTTP/1.0 by default. Perhaps adding request.protocol_version = 'HTTP/1.1' in webserver.py would help ? (but I would expect the curl version used by the appveyor and github workflows to be the same)
(I also see that in some places we have explicit send_header('Connection', 'close'). Unfortunately I failed to note why this was needed...)

@DFEvans DFEvans force-pushed the remove_vsis3_test_order_dependency branch from 4831c51 to fb6a785 Compare September 9, 2021 09:48
@DFEvans
Copy link
Contributor Author

DFEvans commented Sep 9, 2021

The latest changes to set responses to HTTP 1.1 has caused a wide range of very odd test failures that differ by system. I've reverted this PR back to the state it was in before any changes regarding "100 Continue" responses, so it simply includes the test order dependency changes and linting. Pushing for a ~2s reduction in test runtime can wait for another day!

autotest/gcore/vsis3.py Outdated Show resolved Hide resolved
Also remove duplicated definition of AWS_NO_SIGN_REQUEST in vsis3 test fixture
@rouault rouault merged commit 449eb23 into OSGeo:master Sep 9, 2021
@rouault
Copy link
Member

rouault commented Sep 9, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants