Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consideratrion for not using filesize when generating filekeys (or, sorta-more-permanent-links) #52

Closed
Gremious opened this issue Oct 4, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@Gremious
Copy link

Gremious commented Oct 4, 2023

Currently, from what I can tell, filekeys are generated using salt, fspath, fsize, and inode. They are also regenerated on file edit (?).

What this means is that if a text file is edited, the filekey is regenerated, and any links to it break permanently.

Now, this may be intentional as a "if I edit this file/delete it and upload a new unrelated one with the same name, link breaks", but it would be cool if links to files remained 'more-static' until the file is deleted/moved.

Use cases:

  1. I send notes to people, kinda like a pastebin, then edit the note cause I e.g. missed a word and now have to re-send because the link they already opened is now broken. Wish I could just say "oh ops refresh I updated it"
  2. I cannot have a nice <a href="path/to/shopping.md">🛒 Shopping List </a> quicklink in my README.md
    because the file key changes every time I edit the file...

Perhaps instead of filesize, file creation timestamp or something can be used? (to prevent irrelevant re-uploads from having the same key). Or perhaps always generate a new filekey with size like normal, just don't regen if existing file is edited (idk how the code works to know if this is feasible).

@Gremious Gremious added the enhancement New feature or request label Oct 4, 2023
@9001
Copy link
Owner

9001 commented Oct 4, 2023

hey o/

yep you're right; that's how the filekeys are generated, and it was also intentional that they would be replaced if the file contents are modified, since you might forget sharing the link some time down the road and then edit the file, adding something that was not intended to be public. Not saying this was necessarily the best choice however :>

but I'm wondering if filekeys may be just an inconvenience for this purpose, and you'd be better off with a separate volume that just has directory listing disabled, and not enabling filekeys for that volume? maybe with a bit more unique filenames for any files you don't want bruteforced?

thinking about it more -- since part of the filekeys are the salt, they wouldn't be completely pointless if we were to remove the filesize and lastmodified... but it would remove the benefit of automatically generating a new key if the file is deleted and re-uploaded, like you mentioned. It should be possible to add this as an option however, if this is preferable in some scenarios.

the alternative filekey type would probably be just salt and filename, since not all filesystems support the file-creation timestamp, and because it would also recreate the filekey if you were to copy the same files into a public folder twice.

and i guess another alternative would be to generate a truly random key when a file is indexed, storing it in the db, and delete it from the db when the file is deleted -- but that would result in more database activity when listing files, reducing performance, so I'd like to avoid that if possible. Unless there's a reason to consider this of course!

so maybe just add an option to make the filekey only care about filename and salt, and default to the old filekey type since that's safer?

@kkovaletp
Copy link

I'd like to add my thoughts to this conversation. Please track them as alternative point of view:

you might forget sharing the link some time down the road and then edit the file, adding something that was not intended to be public

Regenerating filekeys to solve this problem looks to me as not the best solution: making a choice on the system background instead of the user is always a bad idea, as it creates surprisingly unexpectable behavior of the system for the user: if he\she might forget about the sharing, the same could happen in opposite. Yes, it is safer, but much better to ask the user at the file update time whether the sharing has to be removed or not. There are cases of non-interactive file updates (by some software on the host system) - here I'd say that the user has to own all the responsibility of sharing such files, so if the decision was made by the user to share, the system has to respect that and follow it.

the alternative filekey type would probably be just salt and filename, since not all filesystems support the file-creation timestamp

When I design some software solution, I try to follow a general rule to not re-invent a bicycle and not re-implement the logic or the implementation of dependencies or integrations while it doesn't bring significant restrictions or limitations into the designed solution. In this particular case, a file is an object of a filesystem. All filesystems, used in commonly used OSs are well-tested and maintained, which means that all solutions, built on top of them should rely on the FS logic and use its interfaces. So, as FS addresses a file by the file descriptor, file path (full or relative), we need to follow the same logic and use these keys to build a filekey. It is better to rely on the file descriptor, as it is unique for each file on the volume even if it is updated, moved, or renamed.

@Gremious
Copy link
Author

Gremious commented Oct 5, 2023

but I'm wondering if filekeys may be just an inconvenience for this purpose, and you'd be better off with a separate volume that just has directory listing disabled, and not enabling filekeys for that volume?

Unfortuantely for my use case, I'd want directory listing enabled, because it's my notes folder that I like to browse occationally, and share selectively. This is probably applicable to many types of folders for e.g. "assets" or documents.


It is better to rely on the file descriptor, as it is unique for each file on the volume even if it is updated, moved, or renamed.

so maybe just add an option to make the filekey only care about filename and salt, and default to the old filekey type since that's safer?

I do like that, the idea of having a (per-volume?) choice of "secure 'volitile' keys" (same as is now, default, regens when file changes) and "perma-link keys" which basically stick to a file and don't change, ether with file descriptors or just salt + name is not bad I say. There's also the option of path + name.

Assmuing it works like I think it does, the thought of file descriptors vs name vs path/name is basically "do we want the file to keep being shared if you move it between folders or rename it".

I think if I link to a picture and then decide to organize it into an album, it'd be nice if the embed still worked. You can always 'break' a link by moving the file into a private volume that doesn't have G permissions, but I totally see the idea of "renaming a file 'deletes' it" being very convenient. So between:

descriptor - you cannot hide the file once shared, bar for actually moving/deleteing it (?), links always reliably alive
path/name - moving or renaming the file breaks links immediately, but editing does not
just name - moving ('organizing') the file does not break links, but renaming it does

I like salt + name cause I can keep the file in the voIume I expect it to be, and just conveniently "regenerate" a link by renaming it if something weird happens. But it still allows for organizing my notes and keeping links alive with people I trust.

now, whether "something weird happening" is a real usecase...

@9001
Copy link
Owner

9001 commented Oct 5, 2023

Thanks for your input kkovaletp :>

just to explain the issue for anyone else who might be interested -- we're talking about the per-file access-keys / "passwords" (automatically generated) which can be enabled as an optional feature per-volume. While the g (get) access permission disables directory listings and makes it impossible to see what files exist in a folder, the filekeys feature further enhances this by making URLs to files 404 unless you append the correct filekey (passphrase) to the URL, making it harder to bruteforce-guess filenames in a folder. A practical example is available. Trusted users with the r (read) access permission are able to list directory contents, and also see the valid filekeys of that volume, for sharing with someone who only has the g (get) permission.

much better to ask the user at the file update time whether the sharing has to be removed or not

this speaks in favor of tracking the filekeys in the database, with the resulting performance impact that would have, and also a fair amount of work to extend the APIs with functionality to selectively keep or update an existing filekey... so I'd still prefer to avoid this if possible.

It is better to rely on the file descriptor, as it is unique for each file on the volume even if it is updated, moved, or renamed

true, this would have been a good approach, but there are two reasons we'll have to pick something else for our purpose:

  • ntfs unfortunately uses FileID instead of inode numbers, which are less trivially accessible, causing too much of a performance hit --

  • and more importantly, copyparty's textfile editor does not update files in-place, but it instead deletes the file with the old contents and then creates a new file with the updated text. This is because copyparty also supports deduplicating files by hardlinking them, which would cause any implace file edits to also affect the deduplicated copies.

path/name - moving or renaming the file breaks links immediately, but editing does not
just name - moving ('organizing') the file does not break links, but renaming it does

oh neat, I didn't consider using just the filename. This might be fine now that the salt is 18 bytes large, rather than the measly 4 bytes until v1.7.6, but then it would be a great idea to add a warning on startup if you're using an insufficiently large seed to make that a safe setup.

So that sounds like a plan -- keep the current filekey calculation default, but add one or two alternatives -- one would be using just the filename and salt, and possibly a second which uses the full path instead 👍

@kkovaletp
Copy link

Thanks a lot for the clear and detailed description. Now it is more clear to me.

However, from the user's perspective, the logic of re-generation of the filekey in the background every time the file is modified looks not intuitive: if I, as an owner of a file share, configure guest access and provide some URL to the file to someone, I don't expect it to break after the file modification - other file sharing software does not behave like this.

  • copyparty's textfile editor does not update files in-place, but it instead deletes the file with the old contents and then creates a new file with the updated text

And this is 1 more surprise for me. I didn't use that editor before and I don't rely on file descriptors or creation time now for files on my shared volume, but I see some cases, where such unexpected file re-creation would have a negative impact on other software or data consistency policy. For me (and I guess, for most users) editing doesn't mean recreation. But this is not the topic of this thread anyway...

@Gremious
Copy link
Author

Gremious commented Oct 5, 2023

So that sounds like a plan -- keep the current filekey calculation default, but add one or two alternatives -- one would be using just the filename and salt, and possibly a second which uses the full path instead 👍

I like that :)

A small note - if you use just salt+name for the key, would any two files with the same name have the same key? I don't know if it's used as a seed for an rng generator, and then every file pulls a new key from it, or if it's just hashed or something.

cause so if somebody uploads an empty password.txt anywhere
sees the key for it
and now goes around trying to access that file in random folders
bad times

@9001
Copy link
Owner

9001 commented Oct 5, 2023

yeeeah that's exactly what would happen :p
there really is no winning here huh...
so maybe we should only provide filesystem-path as an option, and not offer using just the filename at all to avoid any accidents like that? or could there still be a usecase for doing just the filename, maybe with an appropriate warning in the --help?

@Gremious
Copy link
Author

Gremious commented Oct 5, 2023

oh eheh

Yeah, I think that at least using path+name is better. It breaks when you move the file, sure, but at least it doesn't break if you edit a typo or your shopping list, so that's already an improvement. And all password.txt's won't have the same key.

As for name-only - considering how easy it was to figure out to do that, and considering that ppl tend to miss help warnings - probably good idea to hold off on that until somebody goes "hey I really really want this and I have a good case for it pls"?

@9001 9001 closed this as completed in 0dc3c23 Oct 6, 2023
@9001
Copy link
Owner

9001 commented Oct 6, 2023

this has been added to v1.9.8; all of the following are equivalent:

  • -v /mnt/nas:nas:g:c,fk,fka
  • -v /mnt/nas:nas:g:c,fka
  • -v /mnt/nas:nas:g:c,fk=4,fka
  • -v /mnt/nas:nas:g:c,fka=4

the default filekey generator uses the old approach (salt+filepath+filesize+inode), and there's a new warning on startup if the salt is considered too weak to be safe.

seems to work, but let me know if something doesn't work as expected 👍

@Gremious
Copy link
Author

Gremious commented Oct 7, 2023

Just tried it, all works great - thank you so much !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants