Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added caching to the async session in request.py and AsyncHTTPProvider #2254

Merged
merged 27 commits into from
Feb 24, 2022

Conversation

dbfreem
Copy link
Contributor

@dbfreem dbfreem commented Dec 15, 2021

What was wrong?

When I was researching the timeouts in the benchmark test I noticed the session is not cached in the request.py for Async aiohttp request. Mainly when I read this issue on the geth repo about multiple http sessions I realized this might be an issue.

Also, the aiohttp library recommends reusing sessions

I feel like as a follow up to this PR the request.py should be broken apart into two classes. One for Async and one for standard HTTP. This would better organize the code and make it more testable.

How was it fixed?

For the test I started with what was in the test_precached_session and modified it for the async request. I also used the pytest.mark.asyncio that I found else where in the project for async testing.
I also started with the code that was used in request.py and HTTPProvider for the caching and modified it for async.

I upgraded pytest-mock trying to get the Async Mocking to work for the aiohttp session with no success in python 3.6. Since I couldn't get the mock working on the async session.post I only tested the session caching and not the post call on the session in request.py. Since there didn't seem to be any conflicts with the upgrade as long as it was less than 3.3.0 I left it in this PR.

I did find an issue later on related to session caching in the web3.py issues list.
Related to Issue #2016

Todo:

  • Add entry to the release notes

@dbfreem dbfreem changed the title Added caching to the async session in request.py and AsyncHTTPProvider WIP Added caching to the async session in request.py and AsyncHTTPProvider Dec 15, 2021
@dbfreem
Copy link
Contributor Author

dbfreem commented Dec 15, 2021

@kclowes do you have any feedback on this PR. Am I on the right track?

@kclowes
Copy link
Collaborator

kclowes commented Dec 15, 2021

This looks good @dbfreem! Thanks for looking into this and for all the PRs!

On first glance, this is about what I would expect in regards to caching the session. I'd probably drop the pytest-mock dependency upgrade if you're not really using it in this PR. If you feel strongly about upgrading it in general, it probably should be in its own PR.

If you pull in the latest master, some of those integration test failures should be less flaky.

@dbfreem
Copy link
Contributor Author

dbfreem commented Dec 16, 2021

I removed the the pytest-mock upgrade and pulled the latest from master.

I love python and enjoy the web3 concept in general so just thought I would try to get involved.

@dbfreem dbfreem changed the title WIP Added caching to the async session in request.py and AsyncHTTPProvider Added caching to the async session in request.py and AsyncHTTPProvider Dec 16, 2021
@dbfreem dbfreem marked this pull request as ready for review December 16, 2021 00:53
Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @dbfreem and for responding to so many of our issues!

There is an existing bug with the current sync implementation where the LRU cache isn't thread safe (see here: #1847), and I wonder what it would take to make this async version thread safe right off the bat. It may be as easy as pulling in the threading library and using threading.Lock as a context manager, but I'm not sure. I am also fine putting that in a different PR (that either you or someone else can tackle, no pressure).

Other than that, I just had a few nitpicks! Thanks again!

request.cache_async_session(URI, session)
assert len(request._async_session_cache) == 1

# Make sure a request with a different URI addes another cached session
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Make sure a request with a different URI addes another cached session
# Make sure a request with a different URI adds another cached session

assert len(request._async_session_cache) == 1

# Make sure a request with a different URI addes another cached session
request.cache_async_session("{0}/test".format(URI), session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - we tend to prefer using f-strings:

Suggested change
request.cache_async_session("{0}/test".format(URI), session)
request.cache_async_session(f"{URI}/test", session)

@dbfreem
Copy link
Contributor Author

dbfreem commented Dec 17, 2021

I was actually wondering if issue #2253 was a thread safe issue on the standard http side.

I will take a look at your comments and thread safety.

@dbfreem
Copy link
Contributor Author

dbfreem commented Dec 18, 2021

So I haven't done a lot of threading before but based on what I say in cache.py plus reading up on python.org about threading lock I think this will work. Unfortunately, this is really hard to test with a unit test.

If you agree with this threading approach I can make the same change in another PR for #1847

Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @dbfreem. Thank you for the work! I've been looking a little bit at how to test the threading, so I'll let you know if I see anything promising.

@dbfreem
Copy link
Contributor Author

dbfreem commented Jan 13, 2022

working on the test for this and did find a bug in the logic so working on fixing that.

@dbfreem
Copy link
Contributor Author

dbfreem commented Jan 14, 2022

This still needs a little work so don't merge this. I will take a look this weekend.

@dbfreem
Copy link
Contributor Author

dbfreem commented Jan 16, 2022

So here is where I am with this. After I started writing a test to test the thread safety stuff I realized that ClientSession.close is an async method and I needed to wrap it in an async/await. When I did this the callback from the cache stopped working. So I decided to write a simple caching class since the requirements here are not that complex.

SessionCache is a class backed by an OrderedDict that simply removes the FIFO value from the dict when it reaches it's preset capacity. I also moved the closing of the connection to the calling code in request.py . IMO that code is more capable of knowing the best way to handle the closing of the connection either async or not instead of the cache handling the closing.

One other side effect that came of this is that I am no longer able to reproduce the threading issue that was there before I changed to SessionCache. Before implementing SessionCache if I ran multiple thread inserting into the cache I would get weird errors when the callback was called. After implementing SessionCache I am no longer able to reproduce these. I did leave the thread lock in for now just for good measure. But we may be able to try SessionCache without the thread lock and get users to test it out if they have seen threading issues before.

I do see that lru-dict is used in cache.py so this doesn't remove the dependency on it but I do feel like the use case for the cache in request.py is pretty simple and SessionCache provides a lot more control over the cache.

Thoughts?? I can go another route if you all do not agree with the SessionCache class. I would need to figure out how to get lru-dict to call the async callback, which could have been user error on my end but I couldn't get that to work. If you do agree with the new approach I can move the request.Session cache over to SessionCache in another PR.

@dbfreem
Copy link
Contributor Author

dbfreem commented Jan 16, 2022

Ok, final answer here. Once, I built the unit test for the AsyncHTTPProvider I realized that the session wasn't being cached since the cache_async_session method is async. I moved this to an async method on the AsyncHTTPProvider to fix this.

All these changes started from the fact that the ClientSession.close() is an asnyc method itself. When caching an item in a fixed sized cache the eviction of some items need to be handled when new items enter the cache. This means calling ClientSession.close() which again is async.

Also, on the new test_async_http_provider module I added. I got the http://mynode.local:8545 from the test_http_provider module.

@kclowes can you review this

@kclowes
Copy link
Collaborator

kclowes commented Jan 17, 2022

Thanks @dbfreem! One of us will take a look this week!

@dbfreem
Copy link
Contributor Author

dbfreem commented Feb 1, 2022

@pacrob or @kclowes I wanted to check in and see if I could get a review on this one.

@kclowes
Copy link
Collaborator

kclowes commented Feb 7, 2022

@dbfreem haven't forgotten this one. Will take a look! Thank you!

@dbfreem
Copy link
Contributor Author

dbfreem commented Feb 8, 2022

Thanks @kclowes. I know this turned into a big PR, sorry about that. It kind of spiraled once I realized that the session close for the aiohttp was an async method and it wasn't actually being closed the way I had it at first.

docs/providers.rst Outdated Show resolved Hide resolved
web3/_utils/request.py Outdated Show resolved Hide resolved
dbfreem and others added 2 commits February 23, 2022 19:59
Co-authored-by: Paul Robinson <[email protected]>
Co-authored-by: Paul Robinson <[email protected]>
@dbfreem
Copy link
Contributor Author

dbfreem commented Feb 24, 2022

@pacrob I committed your suggestions. Thanks

@pacrob pacrob merged commit eed1ae4 into ethereum:master Feb 24, 2022
pacrob pushed a commit to pacrob/web3.py that referenced this pull request Feb 24, 2022
ethereum#2254)

* Added caching to the async session in request.py and AsyncHTTPProvider
pacrob pushed a commit that referenced this pull request Feb 24, 2022
#2254) (#2363)

* Added caching to the async session in request.py and AsyncHTTPProvider

Co-authored-by: AlwaysData <[email protected]>
@pacrob
Copy link
Contributor

pacrob commented Feb 24, 2022

Thanks for adding this, @dbfreem ! I added the _ myself just to cut cycle time. Otherwise, awesome work as usual 👍

@dbfreem
Copy link
Contributor Author

dbfreem commented Feb 24, 2022

No worries about the _ @pacrob and you are welcome.

@DefiDebauchery
Copy link
Contributor

Just a note that I'm having trouble utilizing this. I typically run web3 within a pycord (fork of discord.py) bot, which until this change has been working without issue.

I applied request.py and async_rpc.py. The conflict may be coming from the session lock, but hard to really debug.. I turned on the logging module and it seems to fail after initializing 10 RPC instances (needed for different evm networks), only one of them (not even using CustomSession yet). Will dig more, but wanted to mention this in case anyone might have thoughts.

@kclowes
Copy link
Collaborator

kclowes commented Mar 3, 2022

@DefiDebauchery thanks for the report. Any chance you can open up a new issue with a minimum reproducible example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants