Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_directory_properties and get_file_properties use the blob REST API #28643

Closed
mathieulongtin opened this issue Feb 3, 2023 · 9 comments
Closed
Assignees
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@mathieulongtin
Copy link

  • Package Name: azure-storage-file-datalake
  • Package Version: 12.8.0
  • Operating System: CentOS
  • Python Version: 3.10

Describe the bug
The datalake Python API uses the blob REST API for get_XXX_properties() methods, as a result, it doesn't return the file owner. group, permissions or ACL.

To Reproduce
Here's what get_file_properties returns:

In [27]: ff
Out[27]: <azure.storage.filedatalake._data_lake_file_client.DataLakeFileClient at 0x7f57ee12d2d0>
In [28]: dict(ff.get_file_properties())
Out[28]:
{'name': 'hello/world',
 'etag': '"0x8DB0611ED1897B0"',
 'deleted': False,
 'metadata': {},
 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None},
 'last_modified': datetime.datetime(2023, 2, 3, 18, 10, 26, tzinfo=datetime.timezone.utc),
 'creation_time': datetime.datetime(2023, 2, 3, 18, 10, 26, tzinfo=datetime.timezone.utc),
 'size': 5,
 'deleted_time': None,
 'expiry_time': None,
 'remaining_retention_days': None,
 'content_settings': {...}

Expected behavior
I would expect the information from calling this, including owner, group, permissions and ACL:

In [29]: ff._client.path.get_properties(cls=lambda a,b,c: (a,b,dict(c)))
Out[29]:
(<azure.core.pipeline.PipelineResponse at 0x7f57e7544700>,
 None,
 {'Accept-Ranges': 'bytes',
  'Cache-Control': None,
  'Content-Disposition': None,
  'Content-Encoding': None,
  'Content-Language': None,
  'Content-Length': 5,
  'Content-Range': None,
  'Content-Type': 'application/octet-stream',
  'Content-MD5': 'XUFAKrxLKna5cZ2REBfFkg==',
  'Date': datetime.datetime(2023, 2, 3, 21, 25, 52, tzinfo=datetime.timezone.utc),
  'ETag': '"0x8DB0611ED1897B0"',
  'Last-Modified': datetime.datetime(2023, 2, 3, 18, 10, 26, tzinfo=datetime.timezone.utc),
  'x-ms-request-id': '17183129-701f-000b-6916-38b966000000',
  'x-ms-version': '2021-08-06',
  'x-ms-resource-type': 'file',
  'x-ms-properties': '',
  'x-ms-owner': '--redacted--',
  'x-ms-group': '--redacted--',
  'x-ms-permissions': 'rw-rw----',
  'x-ms-acl': None,
  'x-ms-lease-duration': None,
  'x-ms-lease-state': 'available',
  'x-ms-lease-status': 'unlocked'})

I note however that creation time is not in there.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@ghost ghost added customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 3, 2023
@github-actions github-actions bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 3, 2023
@xiangyan99 xiangyan99 added Storage Storage Service (Queues, Blobs, Files) CXP Attention and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 4, 2023
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 4, 2023
@ghost
Copy link

ghost commented Feb 4, 2023

Thank you for your feedback. This has been routed to the support team for assistance.

@SaurabhSharma-MSFT
Copy link
Member

@mathieulongtin We are redirecting this to services team to look into.

@SaurabhSharma-MSFT SaurabhSharma-MSFT added Service Attention Workflow: This issue is responsible by Azure service team. and removed CXP Attention labels Feb 7, 2023
@ghost
Copy link

ghost commented Feb 7, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Issue Details
  • Package Name: azure-storage-file-datalake
  • Package Version: 12.8.0
  • Operating System: CentOS
  • Python Version: 3.10

Describe the bug
The datalake Python API uses the blob REST API for get_XXX_properties() methods, as a result, it doesn't return the file owner. group, permissions or ACL.

To Reproduce
Here's what get_file_properties returns:

In [27]: ff
Out[27]: <azure.storage.filedatalake._data_lake_file_client.DataLakeFileClient at 0x7f57ee12d2d0>
In [28]: dict(ff.get_file_properties())
Out[28]:
{'name': 'hello/world',
 'etag': '"0x8DB0611ED1897B0"',
 'deleted': False,
 'metadata': {},
 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None},
 'last_modified': datetime.datetime(2023, 2, 3, 18, 10, 26, tzinfo=datetime.timezone.utc),
 'creation_time': datetime.datetime(2023, 2, 3, 18, 10, 26, tzinfo=datetime.timezone.utc),
 'size': 5,
 'deleted_time': None,
 'expiry_time': None,
 'remaining_retention_days': None,
 'content_settings': {...}

Expected behavior
I would expect the information from calling this, including owner, group, permissions and ACL:

In [29]: ff._client.path.get_properties(cls=lambda a,b,c: (a,b,dict(c)))
Out[29]:
(<azure.core.pipeline.PipelineResponse at 0x7f57e7544700>,
 None,
 {'Accept-Ranges': 'bytes',
  'Cache-Control': None,
  'Content-Disposition': None,
  'Content-Encoding': None,
  'Content-Language': None,
  'Content-Length': 5,
  'Content-Range': None,
  'Content-Type': 'application/octet-stream',
  'Content-MD5': 'XUFAKrxLKna5cZ2REBfFkg==',
  'Date': datetime.datetime(2023, 2, 3, 21, 25, 52, tzinfo=datetime.timezone.utc),
  'ETag': '"0x8DB0611ED1897B0"',
  'Last-Modified': datetime.datetime(2023, 2, 3, 18, 10, 26, tzinfo=datetime.timezone.utc),
  'x-ms-request-id': '17183129-701f-000b-6916-38b966000000',
  'x-ms-version': '2021-08-06',
  'x-ms-resource-type': 'file',
  'x-ms-properties': '',
  'x-ms-owner': '--redacted--',
  'x-ms-group': '--redacted--',
  'x-ms-permissions': 'rw-rw----',
  'x-ms-acl': None,
  'x-ms-lease-duration': None,
  'x-ms-lease-state': 'available',
  'x-ms-lease-status': 'unlocked'})

I note however that creation time is not in there.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Author: mathieulongtin
Assignees: jalauzon-msft, vincenttran-msft
Labels:

Storage, question, Service Attention, customer-reported, needs-team-attention

Milestone: -

@jalauzon-msft
Copy link
Member

Hi @mathieulongtin, thanks for raising the issue. You are correct that the Datalake SDK calls the Blob endpoint when fetching properties. This is by design.

However, it does seem to be gap that we are missing the owner, group, and permissions properties (actually some others as well) when calling the Datalake get_file_properties or get_directroy_properties. The reason for this is not because the Blob endpoint does not return these headers, it does for HNS enabled accounts, its actually that we don't parse them in the Blob SDK since they are specific to Datalake.

We do believe this is a gap and that these properties should be returned for Datalake and so we are discussing a solution to this, and I will get back to you when I know more. Thanks.

@mathieulongtin
Copy link
Author

That the data lake Python API uses the blob's REST API for anything kind of baffles me.

@jalauzon-msft
Copy link
Member

jalauzon-msft commented Mar 13, 2023

Hi @mathieulongtin, to update, we have designed a solution to address this gap and will be working on implementing it soon. You can expect to see this in an upcoming release within the next few months.

In the meantime, you can try using get_access_control in the Datalake SDK to get the file ACLs/etc. This API does call into the DFS endpoint and, while it's a separate API, it should give you what you need. Thanks.

@xiangyan99 xiangyan99 added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Apr 21, 2023
@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Apr 21, 2023
@github-actions
Copy link

Hi @mathieulongtin. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@jalauzon-msft
Copy link
Member

Hi @mathieulongtin, a fix for this has been merged into the feature branch for our upcoming release. #29491

This will go to a preview release within the next few weeks, followed by a GA release sometime after.

@github-actions
Copy link

Hi @mathieulongtin, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

5 participants