-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unvendor requests and use urllib3 directly #1495
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1495 +/- ##
===========================================
- Coverage 72.82% 66.67% -6.16%
===========================================
Files 128 129 +1
Lines 14236 14462 +226
===========================================
- Hits 10367 9642 -725
- Misses 3869 4820 +951
Continue to review full report at Codecov.
|
db9b501
to
05ac205
Compare
I'm starting to look at this now, and one of the things that would really help is putting a description of what/how/why you changed certain things. Comparing this to what requests does, it's not quite the same and there's not really an explanation of why anywhere (and it's not immediately obvious from looking at your code). That and docstrings would help out. While this isn't always the case for small PRs, ripping out our existing http client is going to require careful scrutiny and the more insight you can give into the changes you've made, the better (and more timely) feedback you're going to get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't look at this in depth, I mostly just have high level questions for now.
One of the hardest parts to follow was the proxy logic, which at first glance seems to be split across http session functions, the Urllib3Session
class, and some stuff in utils.py
. While it would be preferable to consolidate them in a single class, I think there should be some docstring/comment somewhere that explains how proxies work now that we have to do more of the work than when we relied on requests
.
And lastly, I'm wondering how to handle the connection errors leaking out from requests. I think a lot of code uses this (chalice also catches this), and this would be a pretty big breaking change for customers. Maybe we figure out some way to roll out the exception changes first, get code updated where possible, then merge the urllib3 changes. Worst case is we keep the exception around and just alias it to a new botocore exception.
botocore/http_session.py
Outdated
DEFAULT_TIMEOUT = 60 | ||
MAX_POOL_CONNECTIONS = 10 | ||
# TODO: move the vendored cacert when we drop requests | ||
DEFAULT_CA_BUNDLE = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just copy it over to its new location now so we can just remove vendored/
entirely when we're ready to ship this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't copy it over as the CA_BUNDLE kinda muddies the diff. I was planning on moving this at the same time we remove the vendored requests directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda muddies the diff
Given we won't likely won't remove the vendored requests directory for a while, I'd prefer if this new stuff had minimal dependencies on anything in vendored/
unless necessary. That way we can just remove vendored/
when appropriate. It will also give customers something to switch to if for whatever reason they're depending on our cacert.pem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied the vendored CA Cert and updated this.
botocore/http_session.py
Outdated
return None, None | ||
|
||
|
||
def get_proxy_headers(proxy_url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the proxy and auth stuff seems like it would be better to encapsulate in a single class rather than a collection of module level functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see what I can do. This logic is more or less directly ported from what requests does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is more or less directly ported from what requests does.
It's still confusing to follow and I'd prefer we come up with a better abstraction for proxies if we're going to be maintaining/supporting this.
botocore/http_session.py
Outdated
return headers | ||
|
||
|
||
def fix_proxy_url(proxy_url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this PR is an early draft, but function like fix_proxy_url
without any explanation or docs is hard to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I haven't touched docstrings yet as I didn't know how much churn there would be. I'll try adding some documentation for the things that I think are little more solid.
botocore/http_session.py
Outdated
return where() | ||
|
||
|
||
class Urllib3Session(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this class, general comments:
- It seems like part of this class's logic for proxies is contained in the class itself, and part is in module functions above. It would be helpful to consolidate all the proxy stuff to a single class.
URL
notUrl
.- It looks like this takes the place of the requests session and adapter. What was the rationale for consolidating the two classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- All of the proxy logic inside the class directly relates to taking a properly formed proxy url and turning that into a urllib3 proxied connection. The logic outside the class don't relate to anything urllib3 specific.
URLLib3
?- This correlates to the adapter. It's less of consolidating the two classes and more-so removing the part we didn't use. We effectively passed every parameter every time we made a request so the session didn't provide much. The parts it did give us we either didn't rely on (retries, cookies) or are now our handling in our own retry handler (redirects).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into refactoring the proxy code.
botocore/http_session.py
Outdated
self._timeout = timeout | ||
self._max_pool_connections = max_pool_connections | ||
self._proxy_managers = {} | ||
self._http_pool = PoolManager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is designed to be tied to a single client (i.e one urllib3 session per botocore client) what's the rationale for using a pool manager instead of a connection pool? If we're going to use a pool manager, wouldn't it make more sense to tie it to the botocore session and share it across all clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is potentially a change I've thought about exploring as well. I've kept it this way as it more closely correlates to what requests was going previously and I haven't explored what implications switching from a pool manager to a connection pool would entail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's interesting is that the max_pool_connections
is really "max persistent connections", and in the case of s3 with virtual hosted addressing, this number is potentially 100 max persistent connections by default (10 per pool, 10 pool total, 1 pool used per bucket because it's a different hostname). A connection pool would make that closer to what you'd expect max_pool_connections
to mean, but could potentially result in worse performance. I suppose we should keep this as is for now to minimize risky changes in the switchover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was the big difference I noticed as well.
botocore/awsrequest.py
Outdated
# Read the contents. | ||
try: | ||
data_generator = self.raw.stream() | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would raw
not have a stream method on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it's not a raw response from urllib3.
Currently, this will never happen in practice but we relied on this functionality from requests in some tests where we mock the raw response with a file-like object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd prefer to simplify and remove this fallback if our tests are the only thing relying on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it and just updated the tests that were dependent on it.
botocore/awsrequest.py
Outdated
if not self.content: | ||
return str('') | ||
|
||
return self.content.encode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this take into account encodings provided via response headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, requests had some fairly complex logic for trying to determine the content encoding. The only place the text
property is used is in the ContainerMetadataFetcher
. I think I might just remove this and do the decoding there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched this to just use content and decode.
botocore/awsrequest.py
Outdated
try: | ||
length = len(self.body) | ||
self.headers['Content-Length'] = str(length) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would raise here? The len()
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why's that? We should scope this to something more specific than Exception
.
botocore/awsrequest.py
Outdated
@@ -54,7 +50,7 @@ def _read_status(self): | |||
return HTTPResponse._read_status(self) | |||
|
|||
|
|||
class AWSHTTPConnection(HTTPConnection): | |||
class AWSConnection(object): | |||
"""HTTPConnection that supports Expect 100-continue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring is out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of? The functionality and inheritance described is still true, it just doesn't actually get inherited until it's used as a mixin.
botocore/awsrequest.py
Outdated
try: | ||
data_generator = self.raw.stream() | ||
except AttributeError: | ||
def stream(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to just have this be a separate method and always use it (vs. using raw.stream()
if it's available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case of 'it's what requests does'. I think the motivation here was that the Response object is generic and the underlying response could come from a different library other than urllib3 that may or may not have a stream method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'd prefer to remove this if nothing's relying on it. If we are going to keep it at least extract it out to a method instead of a nested function.
@jamesls What do you think would be the best format for tracking these types of comparisons? I don't necessarily think it makes sense to put these in-code (as the difference are mostly removals). My workflow on this has been less about "this is what requests does" and more about "this is what botocore expects" (passing all the tests). In general my answer to "Why is this different from requests?" will be "because botocore doesn't use it, afaik". At a really high level we're basically taking two things from requests and re-implementing the subset that botocore expects. These two pieces are:
As for the
The model objects do have a lot of changes as I basically gutted them to have next to no logic and only implemented the things that we relied on requests for. This turns out to be not a whole lot of functionality as botocore is already producing / preparing generally well structured requests before handing them off to requests. The behavior that I've found we needed is:
|
If you're asking where's the best place to document where we deviate from what requests does, I think it's worth putting in the code somewhere as either a comment or a docstring. A lot of your previous comment would have been really helpful to read as a comment/docstring while reviewing the code. |
aa8a53b
to
a07d93f
Compare
botocore/awsrequest.py
Outdated
there are the following differences: | ||
This class does not heavily prepare the URL. Requests performed many | ||
validations and corrections to ensure the URL is properly formatted. | ||
Botocore either performs this validations elsewhere or otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/this/these
self.stream_output = stream_output | ||
|
||
if headers is not None: | ||
for key, value in headers.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.headers.update(headers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check if this works but self.headers
isn't a dict
it's a HTTPHeaders
object which has some weird behavior and this is how we were doing this previously so I just left it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this doesn't work.
botocore/awsrequest.py
Outdated
return hash(self._lower) | ||
|
||
def __eq__(self, other): | ||
return self._lower == other._lower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed, but what about putting a try/except here in case this is compared against something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do an isinstance
check I suppose.
@@ -0,0 +1,4433 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, why not just go certifi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to keep changes to a minimum for now. I think this is something we could come back to.
0bf3335
to
0d62cf8
Compare
1237415
to
5ac6bfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I mostly focused on the http_session
code. I'd like to look over the new exceptions in more detail, I'll have more feedback tomorrow.
thread = threading.Thread(target=target, args=args) | ||
thread.daemon = True | ||
thread.start() | ||
yield target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need a try/finally here. For the tests that are expecting a failure being raised, the code after the yield won't be called. It doesn't technically matter because they are background threads, but it's a pretty small change to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
self.fail('Excepted exception was not thrown') | ||
|
||
|
||
def random_port(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest doing what we did for chalice where we have the socket module find an unused port for us. This function has a small chance of failing, but it'd be nice to have something that we know will always work: https://github.com/aws/chalice/blob/master/tests/functional/test_local.py#L95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
except BackgroundTaskFailed: | ||
self.fail('Fake EC2 service was not called.') | ||
else: | ||
self.fail('Excepted exception was not thrown') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected.
self.localhost = 'http://localhost:%s/' % self.port | ||
self.session = botocore.session.get_session() | ||
|
||
def test_can_proxy_https_request_with_auth(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth adding a test for plain http proxies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but we would need a proxy server that understands it. I think at that point we're mostly testing urllib3, but it still could be worth it. I'll see what I can do.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
sock.bind(('', self.port)) | ||
time.sleep(4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use threading.Events
instead? We had similar sleeps in chalice and it ended up being flaky. I think threading events will make these tests more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this sounds good. Done.
botocore/http_session.py
Outdated
) | ||
|
||
def _get_proxy_manager(self, proxy_url): | ||
if proxy_url not in self._proxy_managers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a test for the case where the proxy url is already in the proxy managers dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for this.
botocore/http_session.py
Outdated
def _path_url(self, url): | ||
parsed_url = urlparse(url) | ||
path = parsed_url.path | ||
if not path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me again when path
would be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this can only happen if the url is just the endpoint without the /
like https://somedomain.com
vs https://somedomain.com/
botocore/http_session.py
Outdated
|
||
def send(self, request): | ||
try: | ||
request_target = self._path_url(request.url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you break this method up? It's fairly lengthy.
I also don't quite follow the logic where request_target
is set to self._path_url(request.url)
, but for the proxy case request_target
is set to just request.url
(line 170). Maybe update the comment to explain why we need to use the full url in the proxy case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke it up a little bit more. I've updated the comment to note that proxying an HTTP requires the absolute URI so the host to connect to is part of the request target.
botocore/http_session.py
Outdated
request=request, | ||
endpoint_url=request.url | ||
) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to catch Exception
here? Is is possible to restrict this to a list of exceptions we know about? This will consider even trivial bugs/typos as an HTTPClientError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, it ensures we never directly leak any exceptions from our dependency or a mistake in our implementation. What's your concern with an exception from this layer being wrapped as an HTTPClientError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primarily that actual bugs that raise things like TypeError
/ValueError
etc will be misrepresented as a HTTPClientError
. Ideally we'd want these things to fail fast and propagate all the way up the stack vs. possibly being retried and surfaced as a (somewhat confusing) HTTPClientError
.
Note a huge deal, but I'd prefer to avoid it if possible.
botocore/retryhandler.py
Outdated
@@ -30,8 +30,8 @@ | |||
# this mapping with more specific exceptions. | |||
EXCEPTION_MAP = { | |||
'GENERAL_CONNECTION_ERROR': [ | |||
ConnectionError, ClosedPoolError, Timeout, | |||
EndpointConnectionError | |||
ReadTimeoutError, ProxyConnectionError, EndpointConnectionError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use HTTPClientError
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, I was thinking it might be nice to just have some of the HTTPClientError
exception types extend from a RetryableHTTPClientError
and then use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked over the tests. High level themes:
- I'd like to avoid patching out internal methods. If we have to add new events/hooks to make this possible that's ok with me.
- I'd prefer our existing integration tests aren't modified, and instead we add new ones for the new exception types. This makes it easy to verify that existing code continues to work (similar to what a customer might be trying to do).
Otherwise looks good.
@@ -813,18 +813,17 @@ def test_can_get_bucket_location(self): | |||
def test_request_retried_for_sigv4(self): | |||
body = six.BytesIO(b"Hello world!") | |||
|
|||
original_send = adapters.HTTPAdapter.send | |||
original_send = Endpoint._send |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other options do we have besides patching out a private method?
tests/unit/test_awsrequest.py
Outdated
def tearDown(self): | ||
shutil.rmtree(self.tempdir) | ||
|
||
def test_should_reset_stream(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test being removed? It's still a method on the aws request object so it seems like we should still be testing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were for reset_stream_on_redirect
method on the AWSPreparedRequest
class that's been removed (since we aren't relying on / hooking into redirects from our client). I'll see if I can rework some of these tests to test reset_stream
directly without relying on reset_stream_on_redirect
calling it.
tests/unit/test_awsrequest.py
Outdated
# The stream should now be reset. | ||
self.assertEqual(body.tell(), 0) | ||
|
||
def test_cannot_reset_stream_raises_error(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it seems like the coverage for awsrequest
via tests/unit/test_awsrequest.py
dropped with this PR. It seems like we should at least keep the coverage we had before, and preferably raise the coverage given we're making changes to that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bunch of tests and fixed some of these tests.
if not getattr(self, '_integ_test_error_raised', False): | ||
self._integ_test_error_raised = True | ||
raise ConnectionError("Simulated ConnectionError raised.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I get that we're updating this to use our new exceptions, I do think it's useful to keep a few integration tests around that use the old exceptions to ensure that behavior isn't changed.
I would also prefer we leave the existing integration tests alone and add entirely new ones that verify we can catch the new exceptions.
This would serve as a nice safety check to ensure the same integration tests in develop can run against this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an integration test for one of the exception types (Read timeout) to ensure in a real case we're catching the old exceptions. I also created some unit tests that verify a new raised exception will be caught by the old exception type.
@@ -285,15 +285,14 @@ def test_client_can_retry_request_properly(): | |||
|
|||
def _make_client_call_with_errors(client, operation_name, kwargs): | |||
operation = getattr(client, xform_name(operation_name)) | |||
original_send = adapters.HTTPAdapter.send | |||
def mock_http_adapter_send(self, *args, **kwargs): | |||
original_send = Endpoint._send |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before, but we really shouldn't be touching internal methods in an integration test.
If we need to add better hooks into the endpoint to make it more testable via some public API, I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is another case for a before_send and after_send type of event.
952d96f
to
5deddbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for updating.
db75b50
to
b4682c6
Compare
d8562aa
to
9ce22db
Compare
@kyleknap Added the appropriate notes and changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. 🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the changes themselves look good, it doesn't seem like any new tests were added for this functionality. Only existing tests were updated to keep them passing. I think given these changes were important enough to hold up merging this code, we should have explicit tests for these new commits.
Otherwise looks good.
botocore/parsers.py
Outdated
# Ensure that the http header keys are all lower cased. Older | ||
# versions of urllib3 (< 1.11) would unintentionally do this for us | ||
# (see urllib3#633). We need to do this conversion manually now. | ||
for header in response['headers']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this. It doesn't seem like it's the parsers responsibility to handle the lower casing, which I think is further hinted at by the fact that we're changing our parser because our http client changed (which I would not expect).
What was the reason for not fixing this in the custom headers dictionary class we made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not fixing this in the custom headers dictionary because the previous headers object wasn't responsible for doing this either. This change in behavior would have happened even if we just upgraded requests. This behavior was never intentional on our part or urllib3s part, but is now part of our public contract for what we return. The parsers are what converts the response's header object into a dict
and is what ultimately dictates the final format and structure of what we return from client operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think that sounds reasonable. Could you have this off to a utils function? Some of the other dict manipulation functions are already in utils (e.g merge_dicts
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds reasonable.
docs/source/index.rst
Outdated
the use of ``botocore.vendored.requests.exceptions.*`` or | ||
``botocore.vendored.requests.packages.urllib3.exceptions.*`` must be updated | ||
to the corresponding exception classes in ``botocore.exceptions``. | ||
* The version of ``urllib3`` used to make HTTP requests has been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to put the urllib3 version in the upgrade notes so people don't have to track down the setup.py file, along with the version of requests/urllib3 we have vendored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the version range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
maxsize=22, | ||
timeout=ANY, | ||
strict=True, | ||
ssl_context=ANY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to be actually testing we're passing the correct ssl_context to the pool manager in any of our tests. We could pass None
and these tests would still pass.
I think we need at least one test that ensures it's a sane value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, None
is a sane value (urllib3 will default it for us), I guess I could assert that it is not None
because we always want to construct and pass this ourselves.
self._verify = verify | ||
self._proxy_config = ProxyConfiguration(proxies=proxies) | ||
self._pool_classes_by_scheme = { | ||
'http': botocore.awsrequest.AWSHTTPConnectionPool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like this change is tested anywhere. We should test either that we don't mess with the global namespace somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test to ensure that the global urllib3 defaults are not set to ours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates still look fine 🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for all your work on this, you're great at figuring stuff out!
I think is going to be a great change for customers.
Hello everyone! This PR broke https://github.com/spulec/moto really hard, and many projects depend on it for unit testing (See: getmoto/moto#1793). Can some guidance be provided on how to fix? Pinning to an older version of boto3 can only be exercised in the short term. It would appear that the code here: https://github.com/spulec/moto/blob/master/moto/core/models.py is the most impacted. |
@mikegrima Seems like the crux of it is these lines. Moto was monkey patching the vendored version of requests which is no longer used. I imagine a similar patch could be applied to our We have a few TODOs similar to this where we have a less than optimal solution to patching out the HTTP layer. We're looking into potentially implementing something via the event system to ignore the HTTP layer, but I'm still not 100% sure what will come out of that. This is more of an open question to the moto team: |
Hi @joguSD . I contribute to moto as many of our projects (internally and externally) make heavy use of it, but don't really work on the super low level components of it. That's more of a question for @spulec or @JackDanger. From my perspective: I just want my tests to work and not be locked out of new features because we're stuck on an old version of boto. |
@joguSD : Can you comment on getmoto/moto#1793 (comment) and suggest the best approach for @spulec to take? |
@mikegrima I think the 'best' approach is going to come down to how we expose the appropriate functionality to stub out the HTTP layer in |
This pull request begins to implement the changes discussed in #1486
This builds on some of the work started in this PR: #1466
The work is not yet done but this branch should largely be functional for the majority of use cases. In particular exception cases have been hard to verify, and there will be cases where different exceptions are now raised. This is an unavoidable consequence of moving away from our vendored dependencies. To continue building on this I would like to minimize dependency exception uses to single modules, and ensure that we don't leak exceptions from our dependencies to simplify both the exceptions generated from
botocore
and make it easier to verifybotocore
's exception behavior.Additionally, these changes impact our dependent packages
s3-transfer
,boto3
, andaws-cli
as all have references to our vendored dependencies (mostly exception classes).Downstream Updates:
Boto3: boto/boto3#1668
CLI: aws/aws-cli#3513
Closes #756, #1248, #1258, #1300, #1370, #1385, #1464, #1466,
Closes aws/aws-cli#2994