-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5cc9db7
commit 5fc2859
Showing
5 changed files
with
296 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
The MIT License (MIT) | ||
|
||
Copyright (c) 2019 Jesse Grosjean | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,76 @@ | ||
jwalk | ||
======= | ||
Fast recursive directory iterator. | ||
Fast recursive directory walk. | ||
|
||
- Walk is performed in parallel using rayon | ||
- Results are streamed in sorted order | ||
|
||
This is a work in progress and not recommended for use yet. | ||
This crate is inspired by both [`walkdir`](https://crates.io/crates/walkdir) | ||
and [`ignore`](https://crates.io/crates/ignore). It attempts to combine the | ||
parallelism of `ignore` with the streaming iterator based api of `walkdir`. | ||
|
||
# Example | ||
|
||
Recursively iterate over the "foo" directory sorting by name: | ||
|
||
```no_run | ||
# use std::io::Error; | ||
use jwalk::{Sort, WalkDir}; | ||
# fn try_main() -> Result<(), Error> { | ||
for entry in WalkDir::new("foo").sort(Some(Sort::Name)) { | ||
println!("{}", entry?.path().display()); | ||
} | ||
# Ok(()) | ||
# } | ||
``` | ||
|
||
# Why would you use this crate? | ||
|
||
Performance is the main reason. The following benchmarks walk linux's source | ||
code under various conditions. You can run these benchmarks yourself using | ||
`cargo bench`. | ||
|
||
Note in particular that this crate is fast when you want streamed sorted | ||
results. Also note that even when used in single thread mode this crate is | ||
very close to `walkdir` in performance. | ||
|
||
This crate's parallelism happens at `fs::read_dir` granularity. If you are | ||
walking many files in a single directory it won't help. On the other hand if | ||
you are walking a hierarchy with many folders and many files then it can | ||
help a lot. | ||
|
||
Also note that even though the `ignore` crate has similar performance to | ||
this crate is has much worse latency when you want sorted results. This | ||
crate will start streaming sorted results right away, while with `ignore` | ||
you'll need to wait until the entire walk finishes before you can sort and | ||
start processing the results in sorted order. | ||
|
||
| Crate | Options | Time | | ||
|---------|--------------------------------|-----------| | ||
| jwalk | unsorted, parallel | 60.811 ms | | ||
| jwalk | sorted, parallel | 61.445 ms | | ||
| jwalk | sorted, parallel, metadata | 100.95 ms | | ||
| jwalk | unsorted, parallel (2 threads) | 99.998 ms | | ||
| jwalk | unsorted, serial | 168.68 ms | | ||
| jwalk | sorted, parallel, first 100 | 9.9794 ms | | ||
| ignore | unsorted, parallel | 74.251 ms | | ||
| ignore | sorted, parallel | 99.336 ms | | ||
| ignore | sorted, parallel, metadata | 134.26 ms | | ||
| walkdir | unsorted | 162.09 ms | | ||
| walkdir | sorted | 200.09 ms | | ||
| walkdir | sorted, metadata | 422.74 ms | | ||
|
||
# Why wouldn't you use this crate? | ||
|
||
Directory traversal is already pretty fast with existing more popular | ||
crates. `walkdir` in particular is very good if you need a strait forward | ||
single threaded solution. | ||
|
||
This crate processes each `fs::read_dir` as a single unit. Reading all | ||
entries and converting them into its own `DirEntry` representation. This | ||
representation is fairly lightweight, but if you have an extremely wide or | ||
deep directory structure it might cause problems holding too many | ||
`DirEntry`s in memory at once. The concern here is memory, not open file | ||
descriptors. This crate only keeps one open file descriptor per rayon | ||
thread. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.