Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for generation-based query and delete of blobs. #1435

Closed
wants to merge 14 commits into from

Conversation

tsinha
Copy link

@tsinha tsinha commented Feb 1, 2016

Small modifications to get_blob and delete_blob to handle generation ids when querying buckets that have versioning enabled.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@googlebot googlebot added the cla: no This human has *not* signed the Contributor License Agreement. label Feb 1, 2016
@tsinha
Copy link
Author

tsinha commented Feb 1, 2016

I signed the CLA as Vital Labs, Inc.

path = blob.path
if generation is not None:
path = (blob.path + '?generation=' +
quote(str(generation), safe=''))

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Feb 1, 2016

@tsinha You can ignore the AppVeyor failure for test_upload_from_file_w_bound_client_multipart, we are tracking it on #1434.

Updating Bucket.delete_blob and Bucket.get_blob is insufficient here, Blob.delete and Blob.exists and Blob.reload may also need updating (or you may get that for free when updating Blob.path)

@jgeewax What should be done about matching his signature of the CLA with his GitHub handle?

@tsinha
Copy link
Author

tsinha commented Feb 1, 2016

I'll look at the blob level methods and make sure everything works with the bucket level query, or refactor as you suggested if needed.

I'll post an update shortly.

@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Feb 1, 2016
@tsinha
Copy link
Author

tsinha commented Feb 2, 2016

@dhermes I looked over and fixed blob.exists and blob.delete.

Still unsure about adding generation param directly into blob.path

It doesn't seem as clean as having the api_request create the appropriate paths via blob.path and query_params at the bucket level.

Otherwise, I think this PR should be ready to go.

@dhermes
Copy link
Contributor

dhermes commented Feb 3, 2016

@tsinha the generation is required to be in the path, since it uniquely defines the object if specified.

Bucket.delete_blob and Bucket.get_blob are meant to be "dumb" (i.e. they don't know how to do much) and the knowledge of how to make the request should live on Blob.

@tsinha
Copy link
Author

tsinha commented Feb 3, 2016

@dhermes Thanks for the feedback.

Can you help me brainstorm how to reconcile the way in which api_requests are made with embedding the generation query parameter at the blob level?

This confounds my current WIP, which will enable the renaming of blobs while preserving versions, since the 'copyTo' API end-point requires another query param ('sourceGeneration') to copy the right version.

If we move generation into blob.path, I'd end up with something like the following:

https://www.googleapis.com/storage/v1/b/sourceBucket/o/sourceObject?generation=XXX/copyTo/b/destinationBucket/o/destinationObject?sourceGeneration=XXX

But I may not be thinking it through completely... any advice you have would be helpful.

@dhermes
Copy link
Contributor

dhermes commented Feb 3, 2016

I think you've just thought about it more than I had 😀

However, I'd still rather special case Bucket.copy_blob than special case every other place a generation might be used.

You could also leave Blob.path as-is and we could have another property that also contains identifying informating in query params, like Blob.path_with_params (or a name that is less hideous).

…l to help with generation based queries of blobs. also added version-based blob copying and renaming. fixed a typo in _helpers.py.
@tsinha
Copy link
Author

tsinha commented Feb 4, 2016

@dhermes I've added a new blob.path_with_params function to alleviate some of the checking at the bucket level: here and here.

I've also got a working version of the copy_blob and rename_blob functions with versioning support here and here.

I changed the API for copy_blob a little to fit into the existing rename_blob logic... that is, if versioned copy is enabled, copy_blob returns a tuple of two lists that contain the old and new blobs... rename_blobs takes the old_blobs list from this tuple and deletes them, and then returns the new_blobs list.

If versioning is not enabled, then copy_blobs's return parameters are unchanged from the existing API.

What do you think about this approach? I can add testing around this if it looks reasonable.

@dhermes
Copy link
Contributor

dhermes commented Feb 5, 2016

@tsinha Sorry for the review delay. I'm traveling and won't have a chance to look at this until next Monday. (I feel bad, sorry for the negative experience.)

@tsinha
Copy link
Author

tsinha commented Feb 5, 2016

No worries... I look forward to your feedback next week. Safe travels.

if self.generation is not None:
params = {'generation': self.generation}

return (self.path, params)

This comment was marked as spam.

This comment was marked as spam.

@tsinha
Copy link
Author

tsinha commented Feb 29, 2016

@dhermes I have not forgotten about this. Tied up with some other stuff and will get back to this ASAP.

@dhermes
Copy link
Contributor

dhermes commented Feb 29, 2016

Thanks for the update

@tseaver tseaver added the api: storage Issues related to the Cloud Storage API. label Mar 15, 2017
@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@tillahoffmann
Copy link
Contributor

tillahoffmann commented Jun 30, 2017

@dhermes, @tsinha, thanks for looking into this. Do you have any updates on when this PR might get merged to address #2463?

@dhermes
Copy link
Contributor

dhermes commented Jun 30, 2017

@tillahoffmann There have been lots of changes to blob since this PR was originally sent. Some code paths in the current release (google-cloud-storage==1.2.0) will "just work" for this feature.

Peeking at the code, at least the Blob.download_* functions will incorporate the generation (but, as a hack, only if media_link isn't set on the Blob instance). As for delete, it calls out to Blob.path_helper which does not incorporate the generation.

@tillahoffmann
Copy link
Contributor

@dhermes, thanks for the update. Do you have any recommendations on how to access md5 and crc32c hashes given a generation number from python?

@dhermes
Copy link
Contributor

dhermes commented Jul 1, 2017

Sure thing. Here is a somewhat icky hack (but it'll work for now).

For a file with multiple versions:

$ gsutil ls -a gs://some-bucket
gs://some-bucket/file.txt#1498874876972049
gs://some-bucket/file.txt#1498875135704497

If you reload() it like normal, you'll see the latest version:

>>> from google.cloud import storage
>>>
>>> client = storage.Client()
>>> bucket = client.bucket('some-bucket')
>>> blob_normal = bucket.blob('file.txt')
>>> blob_normal.reload()
>>> blob_normal._properties
{u'bucket': u'some-bucket',
 u'contentType': u'text/plain',
 u'crc32c': u'JcxYOQ==',
 u'etag': u'CLH7moCB59QCEAE=',
 u'generation': u'1498875135704497',
 u'id': u'some-bucket/file.txt/1498875135704497',
 u'kind': u'storage#object',
 u'md5Hash': u'F/IyVDfyzIWJW12F0xO6+Q==',
 u'mediaLink': u'https://www.googleapis.com/download/storage/v1/b/some-bucket/o/file.txt?generation=1498875135704497&alt=media',
 u'metageneration': u'1',
 u'name': u'file.txt',
 u'selfLink': u'https://www.googleapis.com/storage/v1/b/some-bucket/o/file.txt',
 u'size': u'1672',
 u'storageClass': u'MULTI_REGIONAL',
 u'timeCreated': u'2017-07-01T02:12:15.663Z',
 u'timeStorageClassUpdated': u'2017-07-01T02:12:15.663Z',
 u'updated': u'2017-07-01T02:12:15.663Z'}

However, if you manually patch reload on the object, you can get the data you want:

>>> from google.cloud.storage.blob import _quote
>>>
>>> GENERATION = '1498874876972049'
>>>
>>> def reload(self, client=None):
...     client = self._require_client(client)
...     query_params = {'projection': 'noAcl', 'generation': GENERATION}
...     api_response = client._connection.api_request(
...         method='GET', path=self.path, query_params=query_params,
...         _target_object=self)
...     self._set_properties(api_response)
...
>>>
>>> blob_patched = bucket.blob('file.txt')
>>> blob_patched.reload = reload.__get__(blob_patched)
>>> blob_patched._properties
{u'bucket': u'some-bucket',
 u'contentType': u'text/plain',
 u'crc32c': u'MO9tZQ==',
 u'etag': u'CJGY64SA59QCEAE=',
 u'generation': u'1498874876972049',
 u'id': u'some-bucket/file.txt/1498874876972049',
 u'kind': u'storage#object',
 u'md5Hash': u'zPc9wCyNcKGiYV3ijiB2iw==',
 u'mediaLink': u'https://www.googleapis.com/download/storage/v1/b/some-bucket/o/file.txt?generation=1498874876972049&alt=media',
 u'metageneration': u'1',
 u'name': u'file.txt',
 u'selfLink': u'https://www.googleapis.com/storage/v1/b/some-bucket/o/file.txt',
 u'size': u'1635',
 u'storageClass': u'MULTI_REGIONAL',
 u'timeCreated': u'2017-07-01T02:07:56.965Z',
 u'timeDeleted': u'2017-07-01T02:12:15.704Z',
 u'timeStorageClassUpdated': u'2017-07-01T02:07:56.965Z',
 u'updated': u'2017-07-01T02:07:56.965Z'}

@lukesneeringer
Copy link
Contributor

lukesneeringer commented Aug 7, 2017

Hi @tillahoffmann @tsinha,
Since the code has changed so much since your original PR, and it sounds like there is an existing path to do what you need now, I am going to close this. I am sorry that we sat on it for so long.

If there are any outstanding issues remaining, feel free to re-open (or open a new issue).

parthea pushed a commit that referenced this pull request Sep 22, 2023
…tform/python-docs-samples#1435)

* remove face detection samples

* update docstring

* linter

* linter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants