Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Fix public bucket source check in SkyPilot Storage #1087

Merged
merged 9 commits into from
Aug 24, 2022

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Aug 17, 2022

Closes #977.

GetPublicAccessBlock is no longer a reliable method to check if a s3 bucket is public. Buckets can be publicly readable yet not allow access to GetPublicAccessBlock. I'm not sure what caused it (and I can't find a clear reason).

Some buckets still allow it (e.g., s3://tcga-2-open, which is used in our smoke tests and thus was passing), but others have blocked it (e.g., s3://digitalcorpora, s3://imagenet-bucket).

Using head_bucket API now, as recommended by boto3 docs.

Also added storage tests for non-existent and private buckets.

Tested:

tests/test_smoke.py Outdated Show resolved Hide resolved
# These buckets are known to be private, but may need to be updated if
# they are removed by their owners.
with pytest.raises(sky.exceptions.StorageBucketGetError,
match='the bucket name is taken'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match=backend_utils._BUCKET_FAIL_TO_CONNECT_MESSAGE.format(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh private buckets trigger this message on gcp, so the common message is the bucket name is taken:

skypilot/sky/data/storage.py

Lines 1129 to 1133 in 6196552

ex = exceptions.StorageBucketGetError(
f'Attempted to access a private external bucket {self.name}'
'\nCheck if the 1) the bucket name is taken and/or '
'2) the bucket permissions are not setup correctly. '
f'Consider using `gsutil ls gs://{self.name}` to debug.')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, can we move this message as an variable such as PRIVATE_BUCKET_ERROR?

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -820,7 +820,7 @@ def test_new_bucket_creation_and_deletion(self, tmp_local_storage_obj,

@pytest.mark.parametrize(
'tmp_public_storage_obj, store_type',
[('s3://tcga-2-open', storage_lib.StoreType.S3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this bucket in the test? For coverage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point - added!

else:
retry_count += 1
if retry_count > 3:
raise RuntimeError('Could find a nonexistent bucket to use.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not*

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! fixed

@romilbhardwaj
Copy link
Collaborator Author

@michaelzhiluo ready for another look!

@michaelzhiluo
Copy link
Collaborator

This PR looks good to me, as long as above comment is resolved!

@romilbhardwaj
Copy link
Collaborator Author

@michaelzhiluo - regarding refactoring error strings, I'll create another PR to fix that since that's tangential to this bugfix - would that be ok?

@michaelzhiluo
Copy link
Collaborator

Sounds good, in that case, feel free to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to access public bucket with sky storage
3 participants