Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to access "extended file attributes" (xattr, EA) #49604

Open
heinrich-ulbricht opened this issue Mar 14, 2021 · 30 comments
Open

Add API to access "extended file attributes" (xattr, EA) #49604

heinrich-ulbricht opened this issue Mar 14, 2021 · 30 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@heinrich-ulbricht
Copy link

heinrich-ulbricht commented Mar 14, 2021

Background and Motivation

The API currently lacks support to create, read, update and delete "extended file attributes". What are extended file attributes? Citation from the Wikipedia article:

Extended file attributes are file system features that enable users to associate computer files with metadata not interpreted by the filesystem, whereas regular attributes have a purpose strictly defined by the filesystem (such as permissions or records of creation and modification times).

On *nix based systems they are called xattr, on Windows E xtended A ttributes (EA).

MacOS documentation: https://ss64.com/osx/xattr.html
Linux documentation: https://man7.org/linux/man-pages/man7/xattr.7.html
Windows documentation (?): https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/a82e9105-2405-4e37-b2c3-28c773902d85?redirectedfrom=MSDN ($EA and $EA_INFORMATION)
Hints to usage of EA on Windows: https://superuser.com/q/396692/93905

My use case specifically is storing an ID for files that is not visible for the user but links those files to external cloud sources those files were generated (and need to be updated) from. Internet Explorer seems to store the "downloaded from Internet" information there, Antivirus vendors store scan information, Dropbox used those properties in the past as well - there definitely are use cases.

Extended file attributes are a lightweight, cross-platform way of storing file metadata that cannot be tinkered with (at least not easily) by the user. Platform/file system support seems to be good. There are even recent developments like Linux NFS support for xattr: User Xattr Support Finally Landing For NFS In Linux 5.9 To cite one comment: "I've been wanting this for years!" ;)

Proposed API

As extension to System.IO.File?

namespace System.IO
{
    public static class File
    {
        public static FileExtendedAttributes GetExtendedAttributes(string path);
    }
}

(Note: Like the existing File.GetAttributes API)

(Note: It's also possible to set those attributes on directories.)

Usage Examples

try
{
    // here FileExtendedAttributes behaves like a Dictionary<string, string> - although binary values should be possible as well
    FileExtendedAttributes extendedAttributes = File.GetExtendedAttributes("Program.cs");
    extendedAttributes.Add("key", "value");
    Debug.WriteLine(string.Join(", ", extendedAttributes.Keys)); // "key"
    Debug.WriteLine(string.Join(", ", extendedAttributes.Values)); // "value"
    extendedAttributes.Remove("key");
}
catch (NotSupportedException)
{
    // not supported by the underlying platform, file system or kernel
}

Alternative Designs

The API design should reflect existing implementations from other languages and/or platforms. If this proposal is deemed worthy of being pursued further then we would have to look deeper into existing designs.

Risks

Effort for a rarely-used (?) feature.

Short-sighted design; searching for files based on those property values could be desirable. How could this be designed in contrast to existing designs?

@heinrich-ulbricht heinrich-ulbricht added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 14, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 14, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@heinrich-ulbricht
Copy link
Author

Area label would be area-System.IO I suppose.

@carlossanlop
Copy link
Member

We don't yet have a public type called FileExtendedAttributes. Can you please add it to your proposal? The design has to keep in mind that this is cross platform.

@carlossanlop carlossanlop added needs author feedback and removed untriaged New issue has not been triaged by the area owner labels Mar 25, 2021
@carlossanlop carlossanlop added this to the Future milestone Mar 25, 2021
@heinrich-ulbricht
Copy link
Author

This needs a deeper look into the platform-specific capabilities of the existing tools. Will hopefully have a look at it soon.

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs author feedback labels Mar 31, 2021
@heinrich-ulbricht
Copy link
Author

Well Mr. MsftBot the label change was a bit too early...

@carlossanlop
Copy link
Member

carlossanlop commented May 4, 2021

@heinrich-ulbricht have you had a chance to look into this more deeply?

@mklement0 mentioned the following today:

Note that macOS has open-ended extended file-system attributes, similar to NTFS alternate streams (use xattr -s -l /tmp to see an example on macOS),

Another thing we need to keep in mind is that Windows has both extended attributes and reparse points, but they are mutually exclusive (one cannot exist if the other exists). See this document.

@heinrich-ulbricht
Copy link
Author

heinrich-ulbricht commented May 5, 2021

@carlossanlop Unfortunately not. It is relatively easy to propose something that would serve my specific use case. But I feel the need to be more thorough so that the proposal encompasses all platforms and respective platform features of xattr. Having found xattr only recently myself I'm a bit hesitant to propose anything to be honest ^^

@mklement0
Copy link

Thanks for starting this proposal, @heinrich-ulbricht.

Regarding the terminology on Windows: it seems to me that the equivalent of extended attributes on Linux and macOS / BSD are NTFS alternate data streams; the table you link to uses attribute names used by NTFS itself (whereas, as stated in the initial post, extended attributes are about user-defined payloads that have no special meaning to the file-system or the system).

If I understand the linked NTFS documentation correctly, the equivalent of a single extended attribute (a user-defined key-value pair containing arbitrary data) in the Unix world is a single alternate data stream on Windows (NTFS).

@heinrich-ulbricht
Copy link
Author

heinrich-ulbricht commented May 5, 2021

@mklement0 I see, I misinterpreted the NTFS internal attribute name.

I worked with ADS in the (pre .NET Core) past, sounds interesting as an EA equivalent. Does this mean that .NET would have to provide a translation from an ADS API to EA file system features on Linux et al.? And that this does not yet exist? A quick search did not reveal anything other than old posts complaining that .NET access to ADS is not possible at all or about quirks like PowerShell restrictions and/or bugs (further down the page).

While googling I also found a StackExchange question describing my use case: "Is there a standardised way of adding custom metadata to image and other filetypes?". And another random data point: Azure File Sync might profit from this feature as ADS are currently not synched. (Assuming .NET plays a role here. And desperately trying to find arguments.)

What's your take on which direction this proposal might take?

@KalleOlaviNiemitalo
Copy link

The relevant Windows functions are ZwQueryEaFile and ZwSetEaFile, but the documentation seems to cover kernel-mode calls only. There are also FILE_READ_EA and FILE_WRITE_EA in File Access Rights Constants. These are not the same as named streams.

On Linux, I think .NET should treat the extended attribute namespace as part of the string, rather than automatically insert or delete a "user." prefix. That way, the API would be usable with all namespaces, including ones defined in the future. Such an API would make it harder to use extended attributes in a way that is portable across operating systems, but I don't know whether developers even want to do so; perhaps it is more important to read and write OS-specific extended attributes that are already in use.

@mklement0
Copy link

mklement0 commented May 5, 2021

@heinrich-ulbricht, yes, as far as I'm aware there are no .NET APIs for ADS (NTFS alternate data streams) yet, though PowerShell's cmdlets do support them.

What's your take on which direction this proposal might take?

I think a platform-neutral abstraction along the lines you've proposed would be beneficial, and seems possible at least in principle, given that what the platform-specific features share in the abstract is the ability to attach name-value pairs of arbitrary data to file-system items - but there are many details to be hashed out, and it'll be a challenge to find the right balance between focusing on shared, platform-neutral functionality vs. not preventing use of platform-specific functionality.

@KalleOlaviNiemitalo, I think we need to get clarity on the shared concepts as opposed to terminology.

Even though in terms of names, the Windows EAs (extended attributes) seem to be a closer fit, conceptually, ADS seem to be the equivalent of the Unix-world extended attributes, i.e. arbitrary, user-defined pieces of data associated with file-system items that have no special meaning to either the file-system or the operating system.

Can you clarify how the Windows EAs fit into the picture here?

@iSazonov
Copy link
Contributor

iSazonov commented May 6, 2021

(If somebody want to try https://www.powershellgallery.com/packages/PSReflect-Functions/1.1/Content/Examples%5CGet-ExtendedAttribute.ps1)

@iSazonov
Copy link
Contributor

iSazonov commented May 6, 2021

Here mentioned EA is for OS/2 compatibility. So question is make sense to consider the API in modern world?

@heinrich-ulbricht
Copy link
Author

heinrich-ulbricht commented May 6, 2021

@iSazonov This article shows active development regarding EAs: User Xattr Support Finally Landing For NFS In Linux 5.9:

The NFS server updates for Linux 5.9 have support for user-extended attributes on NFS. This is the functionality outlined via IETF's RFC 8276 for handling of file-system extended attributes in NFSv4

Now that there seem to be two options to implement EA-compatibility on Windows - EAs and ADS - there are different goals one might try to achieve:

  1. provide a .NET API to access ADS on Windows - not the goal of this specific ticket
  2. provide a .NET API to access EAs on any platform - although the ticket title says this is the goal it now feels too technical as I don't care which underlying technology will be used to store user-controller metadata
  3. provide a means to store user-controlled file metadata on any one platform - we could achieve this by providing an API that uses ADS on Windows and EAs on Linux et al., but metadata wouldn't be preserved when moving files across file systems; so this is not enough
  4. provide a means to store user-controlled file metadata in a cross-platform and cross-file system way - THIS is the goal of this ticket

My need when opening this ticket was to put user-controlled metadata on files that is persisted when moving those files across file system boundaries (ext2, ext3, ext4, JFS, Squashfs, Yaffs2, ReiserFS, Reiser4, XFS, Btrfs, OrangeFS, Lustre, OCFS2 1.6, ZFS, and F2FS - and NTFS). And to have a .NET API to set this metadata in a cross-platform .NET application.

@heinrich-ulbricht
Copy link
Author

Oh and I don't know if somebody tried to be funny here. Citation from IETF's RFC 8276:

This document describes an optional feature extending the NFSv4
protocol. This feature allows extended attributes (hereinafter also
referred to as xattrs) to be interrogated and manipulated using NFSv4
clients. Xattrs are provided by a file system to associate opaque
metadata, not interpreted by the file system, with files and
directories. Such support is present in many modern local file
systems.

(Emphasis mine.)

@mklement0
Copy link

Thanks for the sleuthing, @iSazonov.

So it does sound like EAs and ADS are alternative implementations of the same basic functionality - but for compatibility with different operating systems, given that the blog post that @heinrich-ulbricht linked to states:

Alternate Data Streams were originally created to support Apple Mac Resource Forks, in files copied from Apple to NTFS and back. I’m not sure Apple even bothers with them any more, now that they’ve moved to something akin to Linux as their OS. [Indeed]

The EA tools repo you mention additionally states the following, disconcerting fact (emphasis added):

Windows does not contain any API that can be used to remove extended attributes. That means if we want to remove $EA from a file, we can:

  • Delete the file altogether.
  • Move the file to a non-NTFS volume and back again.
  • Archive the file to a zip (for instance), delete the original file, and then unpack the zip back to the original location.
  • Modify $MFT directly. This may work in certain circumstances.

Anecdotally, I can say that while I've come across ADS many times, I had never heard of EAs before - which, given the above, is perhaps not too surprising.

Also, the NFSv4-related RFC 8276 @heinrich-ulbricht links to with respect to NFS references ADS (though not by that exact name):

In the New Technology File System (NTFS), extended attributes may be stored within "file streams" [NTFS]

@mklement0
Copy link

@heinrich-ulbricht:

My need when opening this ticket was to put user-controlled metadata on files that is persisted when moving those files across file system boundaries

I do not think this is feasible, as the only feature guaranteed to be available across file-systems is file content, so you'd have to store the metadata as part of the content, which in turn means that you require special tools / APIs to read the regular content - such a file won't act like a regular file anymore (and for directories you'd be out of luck anyway).

I think the best a potential cross-platform .NET API could hope for is described by what FreeBSD has done in its extattr VFS (virtual file system) feature:

As there are a plethora of file systems with differing extended attributes, availability and functionality of these functions may be limited, and they should be used with awareness of the underlying semantics of the supporting file system.

The linked man page is from 1999(!), but is apparently still current; reminiscent of the incompleteness of the NTFS EA APIs, it contains this disconcerting statement (emphasis added):

In addition, the interface does not provide a mechanism to retrieve the current set of available attributes; it has been suggested that providing a NULL attribute name should cause a list of defined attributes for the passed file or directory, but this is not currently implemented.

@iSazonov
Copy link
Contributor

iSazonov commented May 6, 2021

Windows does not contain any API that can be used to remove extended attributes.

Perhaps ZwSetEaFile, can do this.


Given that EA NTFS extended attributes are so little known, is it worth pulling them out in .Net Runtme?

@KalleOlaviNiemitalo
Copy link

Because named streams on Windows can already be created, read, and deleted by passing the appropriate strings to System.IO.FileStream and such, I think there is less need for an alternative "extended file attributes" API that does the same thing; that's why it might be more useful to have the API access EAs as implemented in NTFS. It appears .NET doesn't currently have an API for enumerating named streams (FindFirstStreamW), though.

@iSazonov
Copy link
Contributor

iSazonov commented May 7, 2021

It appears .NET doesn't currently have an API for enumerating named streams (FindFirstStreamW), though.

Yes, and PowerShell implements this internally (GetStreams() method) so I think it makes sense to consider the API in .Net.

@heinrich-ulbricht
Copy link
Author

@mklement0

I do not think this is feasible, as the only feature guaranteed to be available across file-systems is file content, so you'd have to store the metadata as part of the content, which in turn means that you require special tools / APIs to read the regular content - such a file won't act like a regular file anymore (and for directories you'd be out of luck anyway).

Hm, modifying file content does not seem like an option for arbitrary files if the application doesn't "own" the files. And you are right, there is no guarantee the EAs survive when files (or directories) are moved between file systems. But I think from an API perspective I expect to be able to CRUD EAs in .NET on different platforms. This at least would make applications possible that use EAs to store application-specific metadata. Whether EAs survive moving to another file system is out of scope for the API.

Reading the comments I see the hesitancy to pull this "old" EA feature out of the closet into a modern API. It would be good to know how widespread the use of EAs currently really is - or estimate how widespread it would be if there was an API.

Whatever the case I'm loving the discussion so far. Cross-platform, cross-file system, cross-team and cross-century. 🤩

@heinrich-ulbricht
Copy link
Author

heinrich-ulbricht commented May 7, 2021

Another thought: if it won't be possible to provide either

  • a .NET API to access native EAs, or
  • a .NET adapter API to transparently access either EAs (non-NTFS) or ADS (NTFS)

then a .NET application would have to

  • check for the availability of ADS (how?), if yes use them, if no
  • use companion-binaries that are platform/file system specific and allow accessing EAs - those could be cmdline tools from the ancient past, it doesn't matter, as long as they can access EAs

The latter sounds like a new project to be had that provides an adapter API for those tools so they could be used via one API on any platform. This would be the not-so-clean but pragmatic approach. Ideally the companion-binaries are already part of the platform so we don't have to cope with licensing or security implications. And every platform specialist could plug in "their" tool.

@heinrich-ulbricht
Copy link
Author

Just adding another data point for the topic of EAs on Windows: Appropriate function to set NTFS extended attributes: ZwSetEaFile or NtSetEaFile

@heinrich-ulbricht
Copy link
Author

heinrich-ulbricht commented May 22, 2021

Found more info about xattr and cross-platform support over here in discussions in the borg backup repo: borgbackup/borg#1342 and borgbackup/borg#1681

And again, a rather amusing take on this topic:

Alternate Data Streams (ADS) are the Windows equivalent of resource forks (and/or extended attributes, depending on platform and the moon-phase when you're asking the question ;)

Also some code that respects different platforms: https://github.com/borgbackup/borg/blob/master/src/borg/xattr.py

@EraYaN
Copy link

EraYaN commented Jan 23, 2023

Especially on Linux this is very useful to set (and remove) SELinux attributes or for example the NTACL/DOSATTRIB xattrs that samba leaves behind. So the underlying technology does matter (for that use case at least), it shouldn't just be a way to store random metadata in "some" store. The exact keys/names and tech are important so other native tools can use the same attributes.

Right now you can call the actual setxattr functions directly or use the Mono.Posix package, but it's not as nice.

@KalleOlaviNiemitalo
Copy link

Especially on Linux this is very useful to set (and remove) SELinux attributes

That would be "security.selinux" -- so the .NET API should not restrict itself to the "user." namespace.

Perhaps the .NET API could take an enum ExtendedAttributeNamespace { Raw, User } parameter when setting, getting, or enumerating extended attributes. A portable application that does not want to care about operating-system-dependent namespaces would specify ExtendedAttributeNamespace.User; the runtime would then add or remove the "user." prefix on Linux, and not enumerate extended attributes in other namespaces.

Another option might be a public static string UserNamespace { get; } property; but that wouldn't work if an operating system has separate file systems that require different namespace prefixes on extended attributes. I suspect especially remote file systems could require that.

Windows recognizes "$Kernel." and "$Kernel.Purge." prefixes on extended attribute names. Kernel Extended Attributes

@KalleOlaviNiemitalo
Copy link

On Windows, the NFS client was reported to specially handle the extended attribute names "NfsActOnLink", "NfsV3Attributes", and "NfsSymlinkTargetName": "Re: Visible symlinks under Windows" posted to samba-technical on 2008-06-23.

That suggests other file system redirectors might similarly define magic names and risk conflict with user-defined extended attributes. On the other hand, I don't know whether the NFS client even advertises FILE_SUPPORTS_EXTENDED_ATTRIBUTES in GetVolumeInformationW; if it doesn't, then I suppose it is free to use the extended attribute API for its own purposes.

If an application creates files with extended attributes or alternative data streams, and OneDrive moves them to cloud storage, are the EAs and ADSs lost?

@ProIntegritate
Copy link

ProIntegritate commented Feb 15, 2023

It would be good to have access to this functionality, just setting and reading from ADS/EA would be sufficient to me, also as a byte array as not all data is in English and ASCII strings. I don't care about deleting ADS/EA. Storing extra meta information about files is kind of useful, as well as being able to scan files for alternative data streams without messing around with Win32 API and PInvoke crap, something i'd prefer not to do anymore.

In fact it would be great if .NET would grow and import legacy API stuff, regardless of cross platform support. Cross platform functionality can be added in time - if it exists. Also stop focusing on moving files to different fillesystems or cloud drives, the scope here is files stored on filestystems capable of ADS/EA and nothing else. If a user or process moves a file to a filesystem unable to deal with ADS/EA, that is beyond what .Net should be able to solve.

@heinrich-ulbricht
Copy link
Author

heinrich-ulbricht commented Mar 11, 2023

Note: came across this nice sample of creating and copying EAs on Windows, leaving it here, in case somebody wants to see that in action: https://github.com/gtworek/PSBits/tree/master/CopyEAs

@heinrich-ulbricht
Copy link
Author

Another sample that reads EAs on Windows: https://gist.github.com/jborean93/50a517a8105338b28256ff0ea27ab2c8

Today have another use case for Alternate Data Streams that needs to work cross-platform. And so this topic came back :D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

8 participants