Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unvendor requests and use urllib3 directly #1495

Merged
merged 74 commits into from
Aug 24, 2018

Conversation

joguSD
Copy link
Contributor

@joguSD joguSD commented Jul 4, 2018

This pull request begins to implement the changes discussed in #1486

This builds on some of the work started in this PR: #1466

The work is not yet done but this branch should largely be functional for the majority of use cases. In particular exception cases have been hard to verify, and there will be cases where different exceptions are now raised. This is an unavoidable consequence of moving away from our vendored dependencies. To continue building on this I would like to minimize dependency exception uses to single modules, and ensure that we don't leak exceptions from our dependencies to simplify both the exceptions generated from botocore and make it easier to verify botocore's exception behavior.

Additionally, these changes impact our dependent packages s3-transfer, boto3, and aws-cli as all have references to our vendored dependencies (mostly exception classes).

Downstream Updates:
Boto3: boto/boto3#1668
CLI: aws/aws-cli#3513

Closes #756, #1248, #1258, #1300, #1370, #1385, #1464, #1466,
Closes aws/aws-cli#2994

@codecov-io
Copy link

codecov-io commented Jul 4, 2018

Codecov Report

Merging #1495 into develop will decrease coverage by 6.15%.
The diff coverage is 94.62%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1495      +/-   ##
===========================================
- Coverage    72.82%   66.67%   -6.16%     
===========================================
  Files          128      129       +1     
  Lines        14236    14462     +226     
===========================================
- Hits         10367     9642     -725     
- Misses        3869     4820     +951
Impacted Files Coverage Δ
botocore/stub.py 97.8% <100%> (-0.08%) ⬇️
botocore/exceptions.py 100% <100%> (ø) ⬆️
botocore/response.py 92.64% <100%> (+0.58%) ⬆️
botocore/compat.py 93.92% <100%> (ø) ⬆️
botocore/retryhandler.py 99.36% <100%> (-0.01%) ⬇️
botocore/parsers.py 99.78% <100%> (ø) ⬆️
botocore/utils.py 97.43% <86.36%> (-1.01%) ⬇️
botocore/httpsession.py 92.3% <92.3%> (ø)
botocore/endpoint.py 98.37% <94.44%> (+0.95%) ⬆️
botocore/awsrequest.py 99.27% <99.15%> (+0.63%) ⬆️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0cb551e...51f5b5d. Read the comment docs.

@joguSD joguSD force-pushed the pluggable-endpoints branch from db9b501 to 05ac205 Compare July 6, 2018 16:22
@jamesls
Copy link
Member

jamesls commented Jul 6, 2018

I'm starting to look at this now, and one of the things that would really help is putting a description of what/how/why you changed certain things. Comparing this to what requests does, it's not quite the same and there's not really an explanation of why anywhere (and it's not immediately obvious from looking at your code). That and docstrings would help out.

While this isn't always the case for small PRs, ripping out our existing http client is going to require careful scrutiny and the more insight you can give into the changes you've made, the better (and more timely) feedback you're going to get.

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't look at this in depth, I mostly just have high level questions for now.

One of the hardest parts to follow was the proxy logic, which at first glance seems to be split across http session functions, the Urllib3Session class, and some stuff in utils.py. While it would be preferable to consolidate them in a single class, I think there should be some docstring/comment somewhere that explains how proxies work now that we have to do more of the work than when we relied on requests.

And lastly, I'm wondering how to handle the connection errors leaking out from requests. I think a lot of code uses this (chalice also catches this), and this would be a pretty big breaking change for customers. Maybe we figure out some way to roll out the exception changes first, get code updated where possible, then merge the urllib3 changes. Worst case is we keep the exception around and just alias it to a new botocore exception.

DEFAULT_TIMEOUT = 60
MAX_POOL_CONNECTIONS = 10
# TODO: move the vendored cacert when we drop requests
DEFAULT_CA_BUNDLE = os.path.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just copy it over to its new location now so we can just remove vendored/ entirely when we're ready to ship this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't copy it over as the CA_BUNDLE kinda muddies the diff. I was planning on moving this at the same time we remove the vendored requests directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda muddies the diff

Given we won't likely won't remove the vendored requests directory for a while, I'd prefer if this new stuff had minimal dependencies on anything in vendored/ unless necessary. That way we can just remove vendored/ when appropriate. It will also give customers something to switch to if for whatever reason they're depending on our cacert.pem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the vendored CA Cert and updated this.

return None, None


def get_proxy_headers(proxy_url):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the proxy and auth stuff seems like it would be better to encapsulate in a single class rather than a collection of module level functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do. This logic is more or less directly ported from what requests does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is more or less directly ported from what requests does.

It's still confusing to follow and I'd prefer we come up with a better abstraction for proxies if we're going to be maintaining/supporting this.

return headers


def fix_proxy_url(proxy_url):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR is an early draft, but function like fix_proxy_url without any explanation or docs is hard to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I haven't touched docstrings yet as I didn't know how much churn there would be. I'll try adding some documentation for the things that I think are little more solid.

return where()


class Urllib3Session(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this class, general comments:

  • It seems like part of this class's logic for proxies is contained in the class itself, and part is in module functions above. It would be helpful to consolidate all the proxy stuff to a single class.
  • URL not Url.
  • It looks like this takes the place of the requests session and adapter. What was the rationale for consolidating the two classes?

Copy link
Contributor Author

@joguSD joguSD Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. All of the proxy logic inside the class directly relates to taking a properly formed proxy url and turning that into a urllib3 proxied connection. The logic outside the class don't relate to anything urllib3 specific.
  2. URLLib3?
  3. This correlates to the adapter. It's less of consolidating the two classes and more-so removing the part we didn't use. We effectively passed every parameter every time we made a request so the session didn't provide much. The parts it did give us we either didn't rely on (retries, cookies) or are now our handling in our own retry handler (redirects).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into refactoring the proxy code.

self._timeout = timeout
self._max_pool_connections = max_pool_connections
self._proxy_managers = {}
self._http_pool = PoolManager(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is designed to be tied to a single client (i.e one urllib3 session per botocore client) what's the rationale for using a pool manager instead of a connection pool? If we're going to use a pool manager, wouldn't it make more sense to tie it to the botocore session and share it across all clients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is potentially a change I've thought about exploring as well. I've kept it this way as it more closely correlates to what requests was going previously and I haven't explored what implications switching from a pool manager to a connection pool would entail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's interesting is that the max_pool_connections is really "max persistent connections", and in the case of s3 with virtual hosted addressing, this number is potentially 100 max persistent connections by default (10 per pool, 10 pool total, 1 pool used per bucket because it's a different hostname). A connection pool would make that closer to what you'd expect max_pool_connections to mean, but could potentially result in worse performance. I suppose we should keep this as is for now to minimize risky changes in the switchover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was the big difference I noticed as well.

# Read the contents.
try:
data_generator = self.raw.stream()
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would raw not have a stream method on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it's not a raw response from urllib3.

Currently, this will never happen in practice but we relied on this functionality from requests in some tests where we mock the raw response with a file-like object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd prefer to simplify and remove this fallback if our tests are the only thing relying on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it and just updated the tests that were dependent on it.

if not self.content:
return str('')

return self.content.encode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this take into account encodings provided via response headers?

Copy link
Contributor Author

@joguSD joguSD Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, requests had some fairly complex logic for trying to determine the content encoding. The only place the text property is used is in the ContainerMetadataFetcher. I think I might just remove this and do the decoding there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched this to just use content and decode.

try:
length = len(self.body)
self.headers['Content-Length'] = str(length)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would raise here? The len() call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's that? We should scope this to something more specific than Exception.

@@ -54,7 +50,7 @@ def _read_status(self):
return HTTPResponse._read_status(self)


class AWSHTTPConnection(HTTPConnection):
class AWSConnection(object):
"""HTTPConnection that supports Expect 100-continue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring is out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of? The functionality and inheritance described is still true, it just doesn't actually get inherited until it's used as a mixin.

try:
data_generator = self.raw.stream()
except AttributeError:
def stream():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to just have this be a separate method and always use it (vs. using raw.stream() if it's available).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case of 'it's what requests does'. I think the motivation here was that the Response object is generic and the underlying response could come from a different library other than urllib3 that may or may not have a stream method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'd prefer to remove this if nothing's relying on it. If we are going to keep it at least extract it out to a method instead of a nested function.

@joguSD
Copy link
Contributor Author

joguSD commented Jul 6, 2018

@jamesls What do you think would be the best format for tracking these types of comparisons? I don't necessarily think it makes sense to put these in-code (as the difference are mostly removals). My workflow on this has been less about "this is what requests does" and more about "this is what botocore expects" (passing all the tests). In general my answer to "Why is this different from requests?" will be "because botocore doesn't use it, afaik".

At a really high level we're basically taking two things from requests and re-implementing the subset that botocore expects. These two pieces are:

  • requests model objects that represent a request & response (AWSRequest, AWSPreparedRequest, AWSResponse)
  • Requests adapter class for urllib3

As for the Urllib3Session which parallels to the HTTPAdapter class in requests there shouldn't be much difference. The big difference is:

  • Our vendored version of requests implemented chunked encoding for request bodies. Newer versions of urllib3 support this by setting chunked=True when calling urlopen, so we don't need it in our http session class. Additionally, I've not enabled this flag because afaik we never use chunked encoding when making requests.

The model objects do have a lot of changes as I basically gutted them to have next to no logic and only implemented the things that we relied on requests for. This turns out to be not a whole lot of functionality as botocore is already producing / preparing generally well structured requests before handing them off to requests. The behavior that I've found we needed is:

  • Converting the modeled params dictionary to the query string
  • Determining content length of the body
  • Converting the headers to a case-insensitive dictionary
    The biggest differences here are url preparation and determining the content length of the body. Requests parses the url does some validations and conversions on it and reconstructs it. From what I've gathered this process basically just fixes any issues the url may have and I've never ever seen it actually change a url after it was done. I verified this by running all of the tests asserting that the url before and after are the same. As for determining content-length of the body I tried to keep it simple and implement the logic that we document (it's bytes or a file-like).

@jamesls
Copy link
Member

jamesls commented Jul 9, 2018

What do you think would be the best format for tracking these types of comparisons?

If you're asking where's the best place to document where we deviate from what requests does, I think it's worth putting in the code somewhere as either a comment or a docstring.

A lot of your previous comment would have been really helpful to read as a comment/docstring while reviewing the code.

@joguSD joguSD force-pushed the pluggable-endpoints branch from aa8a53b to a07d93f Compare July 10, 2018 21:22
there are the following differences:
This class does not heavily prepare the URL. Requests performed many
validations and corrections to ensure the URL is properly formatted.
Botocore either performs this validations elsewhere or otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this/these

self.stream_output = stream_output

if headers is not None:
for key, value in headers.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.headers.update(headers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check if this works but self.headers isn't a dict it's a HTTPHeaders object which has some weird behavior and this is how we were doing this previously so I just left it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this doesn't work.

return hash(self._lower)

def __eq__(self, other):
return self._lower == other._lower
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed, but what about putting a try/except here in case this is compared against something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do an isinstance check I suppose.

@@ -0,0 +1,4433 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, why not just go certifi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep changes to a minimum for now. I think this is something we could come back to.

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I mostly focused on the http_session code. I'd like to look over the new exceptions in more detail, I'll have more feedback tomorrow.

thread = threading.Thread(target=target, args=args)
thread.daemon = True
thread.start()
yield target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need a try/finally here. For the tests that are expecting a failure being raised, the code after the yield won't be called. It doesn't technically matter because they are background threads, but it's a pretty small change to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.fail('Excepted exception was not thrown')


def random_port():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest doing what we did for chalice where we have the socket module find an unused port for us. This function has a small chance of failing, but it'd be nice to have something that we know will always work: https://github.com/aws/chalice/blob/master/tests/functional/test_local.py#L95

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

except BackgroundTaskFailed:
self.fail('Fake EC2 service was not called.')
else:
self.fail('Excepted exception was not thrown')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected.

self.localhost = 'http://localhost:%s/' % self.port
self.session = botocore.session.get_session()

def test_can_proxy_https_request_with_auth(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth adding a test for plain http proxies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but we would need a proxy server that understands it. I think at that point we're mostly testing urllib3, but it still could be worth it. I'll see what I can do.

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('', self.port))
time.sleep(4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use threading.Events instead? We had similar sleeps in chalice and it ended up being flaky. I think threading events will make these tests more robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sounds good. Done.

)

def _get_proxy_manager(self, proxy_url):
if proxy_url not in self._proxy_managers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a test for the case where the proxy url is already in the proxy managers dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for this.

def _path_url(self, url):
parsed_url = urlparse(url)
path = parsed_url.path
if not path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me again when path would be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can only happen if the url is just the endpoint without the / like https://somedomain.com vs https://somedomain.com/


def send(self, request):
try:
request_target = self._path_url(request.url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you break this method up? It's fairly lengthy.

I also don't quite follow the logic where request_target is set to self._path_url(request.url), but for the proxy case request_target is set to just request.url (line 170). Maybe update the comment to explain why we need to use the full url in the proxy case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke it up a little bit more. I've updated the comment to note that proxying an HTTP requires the absolute URI so the host to connect to is part of the request target.

request=request,
endpoint_url=request.url
)
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to catch Exception here? Is is possible to restrict this to a list of exceptions we know about? This will consider even trivial bugs/typos as an HTTPClientError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, it ensures we never directly leak any exceptions from our dependency or a mistake in our implementation. What's your concern with an exception from this layer being wrapped as an HTTPClientError?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily that actual bugs that raise things like TypeError/ValueError etc will be misrepresented as a HTTPClientError. Ideally we'd want these things to fail fast and propagate all the way up the stack vs. possibly being retried and surfaced as a (somewhat confusing) HTTPClientError.

Note a huge deal, but I'd prefer to avoid it if possible.

@@ -30,8 +30,8 @@
# this mapping with more specific exceptions.
EXCEPTION_MAP = {
'GENERAL_CONNECTION_ERROR': [
ConnectionError, ClosedPoolError, Timeout,
EndpointConnectionError
ReadTimeoutError, ProxyConnectionError, EndpointConnectionError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use HTTPClientError here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, I was thinking it might be nice to just have some of the HTTPClientError exception types extend from a RetryableHTTPClientError and then use that instead.

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked over the tests. High level themes:

  • I'd like to avoid patching out internal methods. If we have to add new events/hooks to make this possible that's ok with me.
  • I'd prefer our existing integration tests aren't modified, and instead we add new ones for the new exception types. This makes it easy to verify that existing code continues to work (similar to what a customer might be trying to do).

Otherwise looks good.

@@ -813,18 +813,17 @@ def test_can_get_bucket_location(self):
def test_request_retried_for_sigv4(self):
body = six.BytesIO(b"Hello world!")

original_send = adapters.HTTPAdapter.send
original_send = Endpoint._send
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other options do we have besides patching out a private method?

def tearDown(self):
shutil.rmtree(self.tempdir)

def test_should_reset_stream(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test being removed? It's still a method on the aws request object so it seems like we should still be testing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were for reset_stream_on_redirect method on the AWSPreparedRequest class that's been removed (since we aren't relying on / hooking into redirects from our client). I'll see if I can rework some of these tests to test reset_stream directly without relying on reset_stream_on_redirect calling it.

# The stream should now be reset.
self.assertEqual(body.tell(), 0)

def test_cannot_reset_stream_raises_error(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it seems like the coverage for awsrequest via tests/unit/test_awsrequest.py dropped with this PR. It seems like we should at least keep the coverage we had before, and preferably raise the coverage given we're making changes to that file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of tests and fixed some of these tests.

if not getattr(self, '_integ_test_error_raised', False):
self._integ_test_error_raised = True
raise ConnectionError("Simulated ConnectionError raised.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I get that we're updating this to use our new exceptions, I do think it's useful to keep a few integration tests around that use the old exceptions to ensure that behavior isn't changed.

I would also prefer we leave the existing integration tests alone and add entirely new ones that verify we can catch the new exceptions.

This would serve as a nice safety check to ensure the same integration tests in develop can run against this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an integration test for one of the exception types (Read timeout) to ensure in a real case we're catching the old exceptions. I also created some unit tests that verify a new raised exception will be caught by the old exception type.

@@ -285,15 +285,14 @@ def test_client_can_retry_request_properly():

def _make_client_call_with_errors(client, operation_name, kwargs):
operation = getattr(client, xform_name(operation_name))
original_send = adapters.HTTPAdapter.send
def mock_http_adapter_send(self, *args, **kwargs):
original_send = Endpoint._send
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before, but we really shouldn't be touching internal methods in an integration test.

If we need to add better hooks into the endpoint to make it more testable via some public API, I'm fine with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is another case for a before_send and after_send type of event.

@joguSD joguSD force-pushed the pluggable-endpoints branch from 952d96f to 5deddbc Compare August 17, 2018 01:38
Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for updating.

@joguSD joguSD force-pushed the pluggable-endpoints branch from db75b50 to b4682c6 Compare August 21, 2018 23:13
@joguSD joguSD force-pushed the pluggable-endpoints branch from d8562aa to 9ce22db Compare August 22, 2018 18:24
@joguSD
Copy link
Contributor Author

joguSD commented Aug 22, 2018

@kyleknap Added the appropriate notes and changelog.

Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. 🚢

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the changes themselves look good, it doesn't seem like any new tests were added for this functionality. Only existing tests were updated to keep them passing. I think given these changes were important enough to hold up merging this code, we should have explicit tests for these new commits.

Otherwise looks good.

# Ensure that the http header keys are all lower cased. Older
# versions of urllib3 (< 1.11) would unintentionally do this for us
# (see urllib3#633). We need to do this conversion manually now.
for header in response['headers']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this. It doesn't seem like it's the parsers responsibility to handle the lower casing, which I think is further hinted at by the fact that we're changing our parser because our http client changed (which I would not expect).

What was the reason for not fixing this in the custom headers dictionary class we made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not fixing this in the custom headers dictionary because the previous headers object wasn't responsible for doing this either. This change in behavior would have happened even if we just upgraded requests. This behavior was never intentional on our part or urllib3s part, but is now part of our public contract for what we return. The parsers are what converts the response's header object into a dict and is what ultimately dictates the final format and structure of what we return from client operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think that sounds reasonable. Could you have this off to a utils function? Some of the other dict manipulation functions are already in utils (e.g merge_dicts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds reasonable.

the use of ``botocore.vendored.requests.exceptions.*`` or
``botocore.vendored.requests.packages.urllib3.exceptions.*`` must be updated
to the corresponding exception classes in ``botocore.exceptions``.
* The version of ``urllib3`` used to make HTTP requests has been updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to put the urllib3 version in the upgrade notes so people don't have to track down the setup.py file, along with the version of requests/urllib3 we have vendored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the version range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

maxsize=22,
timeout=ANY,
strict=True,
ssl_context=ANY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to be actually testing we're passing the correct ssl_context to the pool manager in any of our tests. We could pass None and these tests would still pass.

I think we need at least one test that ensures it's a sane value.

Copy link
Contributor Author

@joguSD joguSD Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, None is a sane value (urllib3 will default it for us), I guess I could assert that it is not None because we always want to construct and pass this ourselves.

self._verify = verify
self._proxy_config = ProxyConfiguration(proxies=proxies)
self._pool_classes_by_scheme = {
'http': botocore.awsrequest.AWSHTTPConnectionPool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this change is tested anywhere. We should test either that we don't mess with the global namespace somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test to ensure that the global urllib3 defaults are not set to ours.

Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates still look fine 🚢

Copy link
Member

@jamesls jamesls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for all your work on this, you're great at figuring stuff out!

I think is going to be a great change for customers.

@mikegrima
Copy link

mikegrima commented Aug 28, 2018

Hello everyone! This PR broke https://github.com/spulec/moto really hard, and many projects depend on it for unit testing (See: getmoto/moto#1793).

Can some guidance be provided on how to fix? Pinning to an older version of boto3 can only be exercised in the short term.

It would appear that the code here: https://github.com/spulec/moto/blob/master/moto/core/models.py is the most impacted.

@joguSD
Copy link
Contributor Author

joguSD commented Aug 28, 2018

@mikegrima Seems like the crux of it is these lines. Moto was monkey patching the vendored version of requests which is no longer used. I imagine a similar patch could be applied to our URLLib3Session class but I would highly recommend not doing that as we don't consider this class as part of our public interface as they're undocumented and may change.

We have a few TODOs similar to this where we have a less than optimal solution to patching out the HTTP layer. We're looking into potentially implementing something via the event system to ignore the HTTP layer, but I'm still not 100% sure what will come out of that.

This is more of an open question to the moto team:
What functionality do you require so that you can avoid applying monkey patches?

@mikegrima
Copy link

Hi @joguSD . I contribute to moto as many of our projects (internally and externally) make heavy use of it, but don't really work on the super low level components of it. That's more of a question for @spulec or @JackDanger.

From my perspective: I just want my tests to work and not be locked out of new features because we're stuck on an old version of boto.

@mikegrima
Copy link

@joguSD : Can you comment on getmoto/moto#1793 (comment) and suggest the best approach for @spulec to take?

@joguSD
Copy link
Contributor Author

joguSD commented Sep 5, 2018

@mikegrima I think the 'best' approach is going to come down to how we expose the appropriate functionality to stub out the HTTP layer in botocore. I gave my opinion on various resolutions and proposed some changes we could possibly make in botocore to see if it would allow moto what they need to fix it last week, but it doesn't seem that anyone on the moto side has had any feedback on what I proposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue requests an improvement to a current feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants