Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploads with S3 Hang Indefinitely #43

Closed
thomastu opened this issue Jan 14, 2020 · 3 comments · Fixed by #50
Closed

Uploads with S3 Hang Indefinitely #43

thomastu opened this issue Jan 14, 2020 · 3 comments · Fixed by #50
Assignees
Labels

Comments

@thomastu
Copy link

Motivation

I'm seeing a crash when using S3 after PR #35. Files now hang infinitely when trying to upload. This is on CKAN 2.8 using python 2.7 with the following cloudstorage related settings:

# Plugins
ckan.plugins = stats text_view image_view recline_view oauth2 datastore datapusher resource_proxy cloudstorage

# Cloudstorage Driver
ckanext.cloudstorage.driver = S3_US_WEST
# Bucket Name
ckanext.cloudstorage.container_name = {{my bucket name}}
# Credentials
ckanext.cloudstorage.driver_options = {"key": "{{AWS_ACCESS_KEY_ID}}", "secret": "{{AWS_SECRET_ACCESS_KEY}}"}
# Use Secure URLs for S3
ckanext.cloudstorage.use_secure_urls = 1
# 5-day maximum for stranded uploads
ckanext.cloudstorage.max_multipart_lifetime  = 5

Diagnosis

I think the crash is ultimately coming from how the SpooledTemporaryFile implements next in python 2.7. It looks like SpooledTemporaryFile.next() returns a callable method, when it should just be wrapping the function (i.e. SpooledTemporaryFile.next() returns self._file.next when it should return self._file.next().) In cloudstorage.storage._get_underlying_file, the underlying file.stream is a SpooledTemporaryFile, and so when it gets passed off to libcloud and calls upload_object_via_stream, libcloud will call next() on this file stream expecting it to eventually stop iterating. Since next() just returns a callable, it goes on forever. I verified this by hacking around and changing L25 above to be the following

wrapper.stream.next = wrapper.stream.next()
return wrapper.stream

and uploads started working once more.

@TkTech
Copy link
Owner

TkTech commented Jan 14, 2020

@Zharktas or @amercader, any chance you can look at this?

@TkTech TkTech added the bug label Jan 14, 2020
@amercader
Copy link

Yes, I'll check it out @TkTech

@thomastu
Copy link
Author

thomastu commented Jan 15, 2020

Just to clarify, this isn't a bug with this codebase, but I don't think the real underlying problem will get solved in upstream python, so there would be value in handling it within this specific application. Here's a small example reproducing the root cause of the bug: (assumes an s3 mock running with localstack on port 4572.)

# Start localstack (on fish shell)
env SERVICES=s3 localstack start --host
# bash: `SERVICES=s3 localstack start --host`

# Make the bucket
aws --endpoint-url=http://localhost:4572 s3 mb s3://my-bucket
"""This snippet reproduces what is effectively happening in 
https://github.com/TkTech/ckanext-cloudstorage/blob/master/ckanext/cloudstorage/storage.py#L246
"""
from libcloud.storage.providers import get_driver
from libcloud.storage.types import Provider

from tempfile import SpooledTemporaryFile

if __name__ == "__main__":

    S3Driver = get_driver(Provider.S3_US_WEST)
    driver = S3Driver("key", "secret", host="localhost", port=4572, secure=False)
    container = driver.get_container("my-bucket")
     
    with SpooledTemporaryFile() as f:
        f.write(b"hello world!")
        f.seek(0)
        # This will hang forever - debugging shows that
        # "<method-wrapper 'next' of cStringIO.StringO object at 0x103435a78>"
        # is just getting written over and over again
        container.upload_object_via_stream(f, "test.txt")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants