Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob.from_string() should be able to parse all valid gcs uri #1107

Closed
pPanda-beta opened this issue Aug 9, 2023 · 2 comments · Fixed by #1170
Closed

Blob.from_string() should be able to parse all valid gcs uri #1107

pPanda-beta opened this issue Aug 9, 2023 · 2 comments · Fixed by #1170
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@pPanda-beta
Copy link

Blob.from_string is terminating parsing of object names at hash(#) character.

Test Code (minimalistic)

from google.cloud import storage
blob = storage.Blob.from_string('gs://my-bucket/my/problamatic/object#name')
print(blob.name)
# my/problamatic/object

Expected

Object name should be my/problamatic/object#name

Actual

Object name found is my/problamatic/object

Temporary Workaround

import re
from google.cloud import storage
GS_PATTERN = re.compile(r"gs://(?P<bucket_name>[^/]+)/(?P<object_name>.+)")

m = GS_PATTERN.match('gs://my-bucket/my/problamatic/object#name')
blob = storage.Bucket(name=m.group('bucket_name'), client=None).blob(m.group('object_name'))
print(blob.name)
# my/problamatic/object#name

Version

google_cloud_storage-2.10.0-py2.py3-none-any.whl

Suggestion

urlsplit is not a good utility for parsing gcs uris. I have seen problems with % char, trailing ?, ... etc with urlsplit. Most of those characters are valid in context of gcs object names.

scheme, netloc, path, query, frag = urlsplit(uri)

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Aug 9, 2023
@frankyn frankyn added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 9, 2023
@cojenco
Copy link
Contributor

cojenco commented Aug 11, 2023

Thanks for filing. I understand how this could cause a parsing issue. I will work with the team to address next steps.

Please note that we strongly recommended users to avoid # and control characters in object names, see more details in the official documentation.

@frankyn
Copy link
Contributor

frankyn commented Aug 11, 2023

Adding findings from Node.js storage after our conversation @cojenco.

Node.js does not yet support a similar method to this but has a FR for it:

However, the client does support gcs URI's in File#copy() and move(). It does support # as part of the object name. Here's an example:

(async () => {
  const {Storage} = require('@google-cloud/storage');

  const bucketName = "bucket-name";

  const storage = new Storage();

  const file = storage.bucket(bucketName).file('object-name');
  const [newFile] = await file.copy("gs://bucket-name/new-year-new-me#random");
  const [metadata] = await newFile.getMetadata();
  console.log(metadata); // Logs metadata and contains object name with `#`
})();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants