Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dereference symlinks #1887

Open
mfripp opened this issue Jun 19, 2014 · 8 comments
Open

Dereference symlinks #1887

mfripp opened this issue Jun 19, 2014 · 8 comments

Comments

@mfripp
Copy link

mfripp commented Jun 19, 2014

There are three potential ways to handle symlinks in the client file system: (1) dereference the symlink and treat the referenced file or folder as if it were a normal object at the location of the symlink; (2) make a copy of the symlink as some sort of symlink object on the server, then reproduce that as a symlink on other clients; or (3) completely ignore the symlink on the client side.

Option (1) would only require support on the client side. This is the approach taken by DropBox, and is fairly easy for users to understand. This approach can create an infinite loop, but that could be addressed by limiting the number of symlinks that the client will traverse. This could be a hard limit or a user setting (e.g., default to 2). A more advanced option would be to traverse symlinks until the same symlink is encountered again, and then stop.

Option (2) would require changes to the core. It may be difficult or impossible to implement, as WebDAV does not support symlinks, and different operating systems have different notions of a symlink (or multiple types of symlink within the same operating system). Symlinks could also become broken or misdirected if they refer to folders that are not included in the sync.

Option (3) is the approach currently taken by ownCloud. However, this foregoes significant utility that could be added fairly easily.

This is a request to implement option (1) -- dereference symlinks in the client and transfer the referenced files and folders to the server, with appropriate limits on recursion.

A discussion of this issue occurred in #665. Early in the discussion, the request appeared to be for option (2), so the moderator suggested creating a request in owncloud core. Later in #665, discussion moved toward option (1), which does not require changes to owncloud core.

Subsequent to #665 , a feature request was placed in owncloud core at owncloud/core#6771. That discussion reached a consensus on using option (1). The moderator suggested, "Since this is something for the client side, please submit your request in the mirall repo. I know that the one about symlinks [#665] was closed, maybe do specify in the subject that links should be dereferenced." This request implements that suggestion.

This feature (dereferencing symlinks) would also solve the problems mentioned in #1299.

A competing request for option (2) is given in #1440. However, I do not think that is likely to be feasible, as discussed above.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/2722731-dereference-symlinks?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F216457&utm_medium=issues&utm_source=github).
@luciamaestro
Copy link

@guruz
Copy link
Contributor

guruz commented Sep 1, 2014

Related #1440

@mfripp
Copy link
Author

mfripp commented Oct 10, 2014

For what it's worth, I was just experimenting with Dropbox, and found that it uses the "advanced" version of Option (1) that I described above. It dereferences symlinks until it finds one that refers to a parent of the current directory; then it stores that symlink as an empty directory. I don't know how it tests this, but one simple way would be to use stat() (not lstat()) to get the inode number of each directory that is traversed, then if the inode number repeats, store an empty directory and stop scanning that branch.

This approach can be slightly quirky, e.g., if someone stores a file in the empty directory on another machine, then that file will eventually get copied into a higher level directory on the original machine, and will then get re-uploaded from there. But that's not too problematic, especially since users will generally shy away from creating these loops in the first place.

@oktayacikalin
Copy link

Option 2 is exactly what devs want. Store the symlinks as they are. If necessary create some symlink compat layer folder on the local machine. But often it's just to help accessing files and folders. The major players like OSX and Linux handle them fine. Even Windows has some kind of symbolic links. It should be a matter of the client whether to restore them or not.

@brevilo
Copy link

brevilo commented Dec 22, 2015

Totally agree with oktayacikalin. Still can't use ownCloud for full syncs because of this...

@jerrac
Copy link

jerrac commented Oct 16, 2016

Is there any movement on this issue?

I would like to see option 1 implemented.

My specific use case is that I need to store some large directories on a normal hard drive, and some files I want quick access to on my ssd. So I want my owncloud root to live on my ssd, and then use symlinks to link to any directories that are too large to fit on my ssd.

To preserve backwards compatibility, I'd suggest that an option "follow symlinks" be added to the client settings, disabled by default. Then users like me could enable it.

@fthommen
Copy link

fthommen commented Feb 7, 2017

I'd like to see option 2 (sync symlinks as OS specific symlinks). Variant 1 (derefence) doesn't make any sense to me. This would lead to data duplication and data inconsisteny in case the symlinked file changes after the symlink has been synced. All OSes I know support a way to link one file to an other. The client should convert the "symlink object" to whatever the current OS uses as link

@phil-davis
Copy link
Contributor

For symlinks that point within the folder tree that the client is syncing, there is no nasty problem. With a suitable protocol/API to/from the server, the "metadata" about the symlink can be stored on the server and:

  1. The server web interface can show the file in both places that it exists,
  2. Another client can sync down the symlink and the client (on Linux, OSX or Windows) can do what is needed on that OS to make the symlink - because the target file is part of the sync anyway. And maybe on iOS and Android it can also make symlinks, I don;t know about that.
    There is a problem if someone is doing selective sync - they might sync the folder with the soft-link in it, but not the folder with the real file.
    Behavior needs to be considered for the different between hard and soft links - for hard links, the actual file only gets deleted when the last hard link is gone, and it would be somewhat trickier for the client to sort out hard links and only sync 1 copy of the file. I suspect that currently if a single physical file is hard-linked to 2 folders, then the file is copied up to the server twice.

Actually the difficult case is when there are soft-links in the sync tree to a file that is out of the sync tree. If you sync just the soft-link to the server, then another client that sync down the soft-link will have a soft-link to somewhere and no actual file data - kind of useless! So you would need to sync up the soft-link (remembering its reference - relative or absolute to somewhere) and sync the real file data (dumping it somewhere or other on the server). Then when another client2 syncs this down, it gets the soft-link (good and easy), then the file data is sent to the client. What does the client do with the file data?
a) If the relative or absolute location exists in the client file system, maybe put the file there, where the soft-link would naturally point to. But maybe that location (drive H:, /mnt/myexternaldisk ...) just happens to exist by luck to match what was on client1, and in 5 minutes the user on client2 will remove the USB stick they happened to have in their computer. Or maybe client2 will not really want ownCloud to fill up some "unexpected" location in their file system.
b) If the location does not exist (e.g. client1 was *nix and client2 is Windows so some absolute path simply has no chance of being valid), then where does onwCloud dump the file contents? It could choose some where (some "temp" folder) and makes the sft-link point there. Then if the file is later edited on client2, does it update the symlink "metadata" to the server, which will then be synced back to client1, and the file get moved somewhere else on client 1???

Anything pointing inside the ownCloud sync tree is a (somewhat) tractable problem. Because the folder/file tree hierarachy is at less somewhat consistent to implement on all client OS.

Anything pointing outside the ownCloud sync tree is undefined, because different client OS have different ways of making a file system tree from the available devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants