-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs::remove_dir_all rarely succeeds for large directories on windows #29497
Comments
I think this is just race conditions at work. When you delete a file another program might get a notification and check out what happened and that might hold a temporary lock on something. Or it might be that the previous deletions weren't actually done by the time you tried to delete the directory. |
Hm, https://github.com/CppCon/CppCon2015/blob/master/Tutorials/Racing%20the%20Filesystem/Racing%20the%20Filesystem%20-%20Niall%20Douglas%20-%20CppCon%202015.pdf slide "3. Deleting a directory tree" may be related. I accidentally watched this presentation yesterday. If I understood correctly, |
@petrochenkov Great link - that seems like it's exactly the problem I'm experiencing. It would be great to see some of these portable race-free idioms implemented in std. |
I'd be interested to dig a bit more into this to see what's going on. According to the Those slides are certainly an interesting read though! I'd be a little wary of putting it into the standard library as the "transactional semantics" are a little weird. For example if you successfully rename a file to a temporary directory and then fail to delete it, what happens? It may be the case that the best implementation for this function on Windows is to just use SHFileOperation, and that may end up just taking care of these sorts of issues on its own. |
I will give fixing this a try. On Windows XP we could use Current problems with
And what I hope will fix it:
I think we know deleting will fail when opening the file with |
O, and I walked against this bug when rustbuild (is that the current name) failed after make clean because of this. |
@pitdicker thanks for taking this on! We may want to discuss this a bit before moving much more on a solution. One question is how principled should something like I personally feel that Curious what others think, though? cc @rust-lang/libs |
@alexcrichton Hmm, I would prefer It's inevitable that people will build their own abstractions over the top of these primitive operations, and I'd very much like to avoid these kind of bugs where it's sufficiently rare that it's almost impossible to track down the true cause of the problem. |
This wouldn't really work because files don't necessarily get deleted right away. You can't just move the file somewhere else and then delete it because there isn't always a reasonable place on the same volume to move the file to. Sitting around and indefinitely waiting for the file to vanish isn't ideal either. I'd personally just use |
What if the file is opened with |
No share flags means nobody else can open it to read/write/delete it. However another handle can still be opened without read/write/delete permissions. Also what happens if you fail to open the file with no share flags? Does it error or fallback to the normal deletion method that provides no guarantees? |
Great to see so many comments. Just what I hoped for! @alexcrichton I completely agree to keep @Diggsey I would also like cross-platform consistency, but sometimes Windows is just to different... I think the standard library should at least expose primitives that only need one system call. Otherwise we lose performance and maybe get more vulnerable to race conditions. Maybe we can add a @retep998 I have not yet written the code, so I don't know if this will really solve the problem... But if this doesn't work out I am all for I would have to test what happens with What is difficult is where to move a file / dir to. I think the parent of the dir we are removing is good enough. It should be on the same volume, and I think we also have write permission to it because otherwise deleting a dir is not possible (wild speculation here, it is late :)) |
I feel ok about going to extra effort to make |
@pitdicker perhaps you could draw up a list of the possible implementation strategies for these functions? That'd also help in terms of weighing the pros/cons of each strategy and we could get a good idea of which terminology is associated with what. So far it sounds like |
To be fair The steps I proposed are more ugly. They use the same simpler but more low-level APIs as the rest of the standard library. But there is a little window where files may temporarily show up in the parent directory of the one we removed. But these files would otherwise be the ones that cause On Windows deleting a file is not guaranteed to be complete before the function returns. Instead a file is scheduled for deletion, when no one else has it open anymore. When Unix on the other hand deletes the file, it looks really deleted. But deletion of the inode is scheduled until everyone is done with it.
Using this flag should be identical to using But I am not sure of anything yet. Somehow I just can't get it to work the renaming part yet, so all this is theory, not tested :(. |
Having looked at the IFileOperation, I agree with @pitdicker that it doesn't seem like the right kind of API for low-level filesystem operations: it's very much oriented towards interactive use. (The .net framework doesn't use it either, but then I'm starting to think that maybe just using the current implementation and retrying on failure is the best option. http://stackoverflow.com/a/1703799 (The second code example has the advantage that it only retries more than once if some progress is made, and retries more often, for more deeply nested directories) |
YES! I've got the manual method working. Now to clean up my messy code, and do a lot of testing... |
@Diggsey Thanks for looking up how .NET handles this! And I like the simplicity of just retrying on failure. But I don't like that it needs a little luck. So I prefer moving files the instant before they are deleted. |
Has a fix already been applied to Also, I don't think that retrying to delete a file a few times sound like a good solution to put into |
I just ran into a related problem myself, where after calling remove_dir_all on a directory which had just a single empty sub-directory, it completed successfully, but then trying to re-create this directory hierarchy immediately failed with Access Denied. The problem is basically not just with remove_dir_all, but with any attempt to delete a directory - it might not be executed immediately, so any action done right after it, which assumes it doesn't exist, might fail. But after going to the issue referenced right above this comment (995 in rustup) I found out about this new "remove_dir_all" crate from @Aaronepower (hosting just this single function), which uses a trick - it moves the directory first, then deletes the moved directory (as documented extensively there). Could its implementation replace the one in std::fs? |
I think there are (at least) three issues here:
I think the first can be mitigated slightly by ignoring any I'm uncertain if the second can be sensibly solved by Rust's std. At least not with the current API. It's just a matter of waiting. For example, the process may need to finish running and then the The third can be solved by either unsetting readonly attributes or using the |
Just my opinion, but retrying seems fine to me, this is already what some APIs do, such as symlinking in std on Windows, if it receives a certain error from Windows it just tries again with some other flags. The same could work here. I think the main thing to be aware of with retrying though is for it to still feel responsive and design it so that someone can't cause a denial-of-service attack by getting this API to repeatedly try to delete a running exe (such as itself). I think you'd want something like having just a single timeout for a fixed amount of time, only trigger timeout countdown if we've failed, and to return You could maybe also probably be smarter about it and do something like collect all entries that fail, and try them a second time after you've deleted everything else, so that it's more likely that the OS has had enough time to finish up whatever it was doing. |
Add dependency `remove_dir_all` Note: On windows `std::fs::remove_dir_all` doesn't really work (rust-lang/rust#29497).
Add dependency `remove_dir_all` Note: On windows `std::fs::remove_dir_all` doesn't really work (rust-lang/rust#29497).
* cmd* macros: Add ability to specify argument collection with `@<expression>` * Add fs utilities Add dependency `remove_dir_all` Note: On windows `std::fs::remove_dir_all` doesn't really work (rust-lang/rust#29497). * Allow errors to be turned into cargo warnings * Add git utilities Publicly export the `which` crate Remove `log` dependency of `cmd*` macros. * Add initial cmake tools Add dependency `cmake` Add initial subset of cmake file API Add function to extract defined variables in cmake script * Add cmake-file-API cache object * Refactor `bindgen`, add `from_cmake` contructor * cmake: Add toolchains object * Add type alias `NativeCommandArgs` * Make `LinkArgsBuilder` methods usable * Construct `LinkArgsBuilder`, `CInclArgs` from cmake build info * Add `CfgArgs::try_from_json` * Don't detect bindgen linker from cache Fix kconfig json parse bug * Fix clippy warnings * Fix `git::Repository::apply` Add `git::Repository::apply_once` Add `git::Repository::is_applied` Publicly export `anyhow` (hidden from docs) because some macros need it. Fix `cmd_spawn` Add `cmd` variation that returns `Result<ExitStatus>` * Improve cmake-file-api error messages Check the cmake version for support of a specific file-api object kind.
It looks like one of the commits that addressed the |
@ChrisDenton Looks like @pwinckles The loop there may reduce the incidence, but as it is reentering into the entire thing its going to repeat IO for the depth of the point where the error occurred, though it will at least bail out fast. I wouldn't want to use that version on rustups deletion of docs trees, for instance; though I am still interested in getting a reliable version into the core. |
@rbtcollins Btw, the code currently in the standard library was written with some amount of haste so isn't necessarily the best example. I do plan to get back to it but lately I've been distracted with other issues. |
@ChrisDenton thats super interesting - and not visible on the MSDN doc pages :/ I meant no negativity about the current code - it is great the CVE was solved, and there time mattered a lot... and this bug is really (to my mind) about figuring out what is sensible for the stdlib - tradeoffs between complexity, generality, least surprise etc abound. |
The newer information classes are listed (e.g.
Sorry! I didn't take what you said as negativity. I was meaning that more as a warning that, while I'm happy it fixes the CVE, I would still like to see the code cleaned up and improved. |
@ChrisDenton so FileDispositionInfoEx notes that deleting a file with posix semantics - " When FILE_DISPOSITION_POSIX_SEMANTICS is set, the link is removed from the visible namespace as soon as the POSIX delete handle has been closed, but the file's data streams remain accessible by" - does this mean that the directory delete will recieve ERROR_DIR_NOT_EMPTY while those processes with handles open continue to consume the file? Or is it genuinely unlinked from the directory and thus the directory is deletable? |
@rbtcollins The Other file handles to the same file can remain open. They have no effect at all on the POSIX delete (unless of course they lock the file). |
Thank you @ChrisDenton . I'm having a little trouble figuring out the version support for this comment of yours: |
The documentation is pretty limited atm so the best bet is the C/C++ headers which indeed indicate that it's available on 1809 and later (aka I'd suggest trying the |
I've been trying to track this one down for a while with little success. So far I've been able to confirm that no external programs are holding long-standing locks on any of the files in the directory, and I've confirmed that it happens with trivial rust programs such as the following:
I've also confirmed that deleting the folder from explorer (equivalent to using SHFileOperation), or using
rmdir
on the command line will always succeed.Currently my best guess is that either windows or other programs are holding transient locks on some of the files in the directory, possibly as part of the directory indexing windows performs to enable efficient searching.
Several factors led me to think this:
fs::remove_dir_all
will pretty much always succeed on the second or third invocations.Maybe there should be some sort of automated retry system, such that temporary locks do not hinder progress?
edit:
The errors returned seem somewhat random: sometimes it gives a "directory is not empty" error, while sometimes it will show up as an "access denied" error.
The text was updated successfully, but these errors were encountered: