Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor S3, replace high-level resource/session API with low-level client API #583

Merged
merged 20 commits into from
Mar 1, 2021
72 changes: 72 additions & 0 deletions MIGRATING_FROM_OLDER_VERSIONS.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,75 @@
Migrating to the new client-based S3 API
========================================

Version of smart_open prior to 5.0.0 used the boto3 `resource API_` for communicating with S3.
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
This API was easy to integrate for smart_open developers, but this came at a cost: it was not thread- or multiprocess-safe.
Furthermore, as smart_open supported more and more options, the transport parameter list grew, making it less maintainable.
Starting with version 5.0.0, smart_open uses the `client API`_ instead of the resource API.
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
piskvorky marked this conversation as resolved.
Show resolved Hide resolved
Functionally, the little changes for the smart_open user.
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
The only difference is in passing transport parameters to the S3 backend.

More specifically, the following S3 transport parameters are no longer supported:

- `multipart_upload_kwargs`
- `object_kwargs`
- `resource`
- `resource_kwargs`
- `session`
- `singlepart_upload_kwargs`

If you weren't using the above parameters, nothing changes for you.
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
However, if you were using any of the above, then you need to adjust your code.
Here are some quick recipes below.

If you were previously passing `session`, then construct an S3 client from the session and pass that instead.
For example, before::

smart_open.open('s3://bucket/key', transport_params={'session': session})

After::

smart_open.open('s3://bucket/key', transport_params={'client': session.client('s3')})


If you were passing `resource`, then replace the resource with a client, and pass that instead.
For example, before::

resource = session.resource('s3', **resource_kwargs)
smart_open.open('s3://bucket/key', transport_params={'resource': resource})

After::

client = session.client('s3')
smart_open.open('s3://bucket/key', transport_params={'client': client})

If you were passing any of the `*_kwargs` parameters, you will need to include them in `client_kwargs`, keeping in mind the following transformations.

========================== ====================================== ==========================
Parameter name Resource API method Client API function
========================== ====================================== ==========================
`multipart_upload_kwargs` `s3.Object.initiate_multipart_upload`_ `create_multipart_upload`_
`object_kwargs` `s3.Object.get`_ `get_object`_
`resource_kwargs` ??? ???
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
`singlepart_upload_kwargs` `s3.Object.put`_ `put_object`_
========================== ====================================== ==========================

The `client_kwargs` dict can thus contain the following members:
piskvorky marked this conversation as resolved.
Show resolved Hide resolved

- `s3.Client`: initializer parameters, e.g. those to pass directly to the `boto3.client` function
- `s3.Client.create_multipart_upload`
- `s3.Client.get_object`
- `s3.Client.put_object`

.. _resource_API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#service-resource
.. _s3.Object.initiate_multipart_upload: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.initiate_multipart_upload
.. _s3.Object.get: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.get
.. _s3.Object.put: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ObjectSummary.put

.. _client_API: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client
.. _create_multipart_upload: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.create_multipart_upload
.. _get_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object
.. _get_object: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.put_object

Migrating to the new dependency management subsystem
====================================================

Expand Down