-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated iter_bucket to use concurrent futures. #368
Conversation
Interesting! How do we (unit-)test this? |
Thank you for your contribution. Looks like the right thing to do. Looks like the existing tests are failing because of a non-existing dependency: `flask'. Why do we depend on this package? @derpferd Could you please:
|
@mpenkov Will do. I saw the flask import error in Travis CI. I am not getting this error locally. I will try a couple things to try to get it working. As far as unit tests, will push those. For AWS Lambda testing I successfully ran the updated version of the function without problems. |
9d52d14
to
a69c7fb
Compare
This commit addresses issue piskvorky#340. AWS Lambda environments do not support multiprocessing.Queue or multiprocessing.Pool, which are used by iter_bucket to optimize the pulling of files from s3. Solution: Switch to using concurrent.futures.ThreadPoolExecutor instead. This still optimizes the pulling of files from s3 without using new processes.
So, it appears the moto framework, is not reliable. It randomly gets the following error. It appears that this is an issue with moto not the change made.
|
The comment in our setup.py leads to this moto issue: There's a long discussion I just skimmed, but the problem seems related to botocore version and module import order. |
I agree that this seems to be a problem with moto as opposed to the code @derpferd made. We can't merge while the tests are failing, though. I suggest the following:
@piskvorky Does the above seem acceptable? |
@mpenkov This wouldn't work. The tests randomly fail on all versions of Python not just 3.7 😢 We could switch to using moto server instead of the mock. We could also upgrade the version of moto to a more recent version (both of these options would probably take lots of work). |
OK then, how about this: disable the test for all version of Py, and replace with an integration test. Another alternative is to re-run the test multiple times, and count a success if it passes once or more. @piskvorky WDYT? |
We cannot give write access to our private (paid) S3 buckets to the public. Or how would this protect our access credentials? Upgrading moto makes sense to me, the issue may be fixed according to that thread (though I may be wrong, just skimmed it). @derpferd why do you think upgrading moto to a more recent version would take a lot of work? Since the issue is unrelated to changes in this PR, I assume it happens in |
@piskvorky We need not give access to our buckets. The functionality in this PR is read-only, so we can apply it to any publicly-readable bucket out there. Perhaps you're thinking of #372? |
I don't think we have any publicly-accessible buckets, do we? I thought we'd have to create an IAM role with limited (read-only) access to our private S3 bucket, and use that. Or do you mean using a completely public-access bucket (not ours) in the tests? Which one? |
Any publicly-accessible bucket will do.
|
Yes, I guess we can do that (as long as the 3rd party bucket doesn't change too often). |
@derpferd Ping. Are you able to finish this PR? |
This reverts commit 6506562.
Merged. @derpferd Thank you once again for your efforts! |
This PR addresses issue #340.
AWS Lambda environments do not support multiprocessing.Queue or
multiprocessing.Pool, which are used by iter_bucket to optimize the
pulling of files from s3.
Solution: Switch to using concurrent.futures.ThreadPoolExecutor instead.
This still optimizes the pulling of files from s3 without using new
processes.