Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mkdir -p are not correctly reported on Linux #60

Closed
nono opened this issue Oct 24, 2018 · 15 comments · Fixed by #127
Closed

Mkdir -p are not correctly reported on Linux #60

nono opened this issue Oct 24, 2018 · 15 comments · Fixed by #127

Comments

@nono
Copy link
Contributor

nono commented Oct 24, 2018

Description

When creating recursively directories with mkdir -p, the watcher only reports the first directory.

Steps to Reproduce

  1. Install the CLI from Add a CLI #59
  2. Run nsfw /tmp
  3. Execute mkdir -p /tmp/foo/bar/baz

Expected behavior:

The watcher says that directories foo, foo/bar and foo/bar/baz have been created:

Created: /tmp/foo
Created: /tmp/foo/bar
Created: /tmp/foo/bar/baz

Actual behavior:

The watcher only says that the directory foo has been created:

Created: /tmp/foo

Additional Information

As far as I know, inotify is not recursive, and it's nsfw code that creates sub-watchers for directories inside the watched directory. When inotify sends the event for foo, the directories foo/bar and foo/bar/baz have already been created, so even if the watcher is able to find these directories to tell inotify to watch them, it fails to also send an event for them.

@implausible
Copy link
Contributor

That's pretty interesting. So it's difficult to say if there is a good solution to this problem.

I think you already have the right impression, but it's like the directory foo gets created, and before NSFW even has time to react to create the inotify watch on foo, I imagine that bar and baz are also created.

After the mkdir -p completes, do file events get reported from bar and baz?

@nono
Copy link
Contributor Author

nono commented Oct 25, 2018

Yes, events are reported from bar and baz after that. For example, if I create a foo/bar/baz/qux file, I rename baz to courge, and then delete foo, I got:

Created: /home/nono/tmp/foo
Created: /home/nono/tmp/foo/bar/baz/qux
Modified: /home/nono/tmp/foo/bar/baz/qux
Renamed: /home/nono/tmp/foo/bar/baz → courge
Deleted: /home/nono/tmp/foo/bar/courge/qux
Deleted: /home/nono/tmp/foo/bar/courge
Deleted: /home/nono/tmp/foo/bar
Deleted: /home/nono/tmp/foo

@implausible
Copy link
Contributor

So while it doesn't report the creation of bar and baz, is that necessarily important? I imagine as a package that utilizes NSFW, when I get the event for foo's creation I will probably scan that directory and do whatever I'm supposed to do at that time. Getting the individual reports on the way to baz is probably less important overall. Is that reasonable to expect?

@nono
Copy link
Contributor Author

nono commented Oct 26, 2018

Well, often, it is not the directories but the files that are interesting. And it can happen with files too. For example:

$ mkdir -p foo/bar
$ echo baz > foo/bar/baz
$ mv foo /path/to/watched/dir/

will report only an event for foo. In that case, you can scan that directory and see the new file, but having a reliable way to do that can be tricky. It's a race between NSFW adding its inotify on sub-directories and the scan, and some files can be missed by a race condition.

The first example I can think of is if the added directory contains two sub-directories, one with a lot of things inside and the other is just empty. If NSFW starts with the big sub-directory and the user scan starts the empty directory, the inotify watch can be setup after the scan, and during the time, it a file is created in this empty directory, it will be missed.

It's an edge case, but it looks like these edge cases happen more often that I'd like when I'm working with file system watchers.

So, in my humble opinion, it would be better if NSFW emits the events in this case. But I also understand that you may want to do that as it increases what NSFW does, and it has a cost to maintain this code. Maybe a middleground can be to add documentation for the edge cases like this.

@antshuwen
Copy link

Well, often, it is not the directories but the files that are interesting. And it can happen with files too. For example:

$ mkdir -p foo/bar
$ echo baz > foo/bar/baz
$ mv foo /path/to/watched/dir/

will report only an event for foo. In that case, you can scan that directory and see the new file, but having a reliable way to do that can be tricky. It's a race between NSFW adding its inotify on sub-directories and the scan, and some files can be missed by a race condition.

The first example I can think of is if the added directory contains two sub-directories, one with a lot of things inside and the other is just empty. If NSFW starts with the big sub-directory and the user scan starts the empty directory, the inotify watch can be setup after the scan, and during the time, it a file is created in this empty directory, it will be missed.

It's an edge case, but it looks like these edge cases happen more often that I'd like when I'm working with file system watchers.

So, in my humble opinion, it would be better if NSFW emits the events in this case. But I also understand that you may want to do that as it increases what NSFW does, and it has a cost to maintain this code. Maybe a middleground can be to add documentation for the edge cases like this.

We have been troubled by the same bug. How do you work around in the end?

@amiramw
Copy link

amiramw commented Oct 18, 2019

We have another use case. We watch the creation of files with specific extensions. Once those files are created as part of yeoman generator together with their parent folder, most of the times those file creation events are missing.

@tsmaeder
Copy link
Contributor

To me it seems that the right thing to do would be this, whenever a directory "created" event is detected:

  1. Check whether an inotify watch for this directory has already been created (in step 3 below) if yes, return
  2. Create an inotify watch for the dir
  3. Send a "created event" to the client for the dir
  4. iterate over the contents of the directory and recursively call this function

my assumption is that after creating the watch in step 2, we would get an inotify event for any entry created after that line. So iterating the directory afterwards, we would be sure to catch all entries created before that line. We might get a couple of entries created later, but that is not relevant, since we check whether we've seen the entry in line 1.

WDYT?

@tsmaeder
Copy link
Contributor

@marechal-p any thoughts?

@paul-marechal
Copy link
Contributor

@tsmaeder emitting virtual events while iterating over the file tree seems good to me. I've been trying to reproduce on Windows to see if that too would need a fix but I am having issues running nsfw at all, I don't get any events for some reason...

@tsmaeder
Copy link
Contributor

@marechal-p it's a linux-specific problem. Windows & Mac file watcher API's are much saner that Linux.

@paul-marechal
Copy link
Contributor

paul-marechal commented Feb 11, 2021

I looked some more into this issue, and there is in fact a "blind spot" where NSFW will just completely miss some folders.

I was able to reproduce it on the following branch: master...theia-ide:mp/linux-tree-race

In the reproduction commit I modified the C++ sources to create a folder in what I believe is the blind spot, but that folder could in fact be created by any other process. I added a test that checks that events for other folders are correctly emitted, and that this extra folder created during the blind spot is never handled.

So there are two problems:

  1. There is a blind spot when creating the recursive tree.
  2. Once a given tree is setup, adding a folder containing its own hierarchy won't fire created events for every new file added, most likely because NSFW will see the new folder being added but by the time NSFW reacts and starts building the subtree all the files are already on the file system, so NSFW will just collect them with scandir and not emit any events while doing so.

@paul-marechal
Copy link
Contributor

paul-marechal commented Feb 11, 2021

TL;DR: any process that quickly creates folders on the fs will participate in a race with NSFW trying to keep up, and with lucky timing a folder can be created in NSFW's blind spot and never be handled.

From the top of my head, programs like git checkout/clone or mkdir -p are such programs that will quickly create folders.

@tsmaeder
Copy link
Contributor

I think my solution would still close that window of vulnerability.

@paul-marechal
Copy link
Contributor

@tsmaeder are you working on an implementation for it?

@tsmaeder
Copy link
Contributor

No time for fun stuff currently 🤷.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants