Skip to content

Commit

Permalink
tests and cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
jessegrosjean committed Feb 8, 2019
1 parent 5cc9db7 commit 5fc2859
Show file tree
Hide file tree
Showing 5 changed files with 296 additions and 77 deletions.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2019 Jesse Grosjean

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
72 changes: 70 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,76 @@
jwalk
=======
Fast recursive directory iterator.
Fast recursive directory walk.

- Walk is performed in parallel using rayon
- Results are streamed in sorted order

This is a work in progress and not recommended for use yet.
This crate is inspired by both [`walkdir`](https://crates.io/crates/walkdir)
and [`ignore`](https://crates.io/crates/ignore). It attempts to combine the
parallelism of `ignore` with the streaming iterator based api of `walkdir`.

# Example

Recursively iterate over the "foo" directory sorting by name:

```no_run
# use std::io::Error;
use jwalk::{Sort, WalkDir};
# fn try_main() -> Result<(), Error> {
for entry in WalkDir::new("foo").sort(Some(Sort::Name)) {
println!("{}", entry?.path().display());
}
# Ok(())
# }
```

# Why would you use this crate?

Performance is the main reason. The following benchmarks walk linux's source
code under various conditions. You can run these benchmarks yourself using
`cargo bench`.

Note in particular that this crate is fast when you want streamed sorted
results. Also note that even when used in single thread mode this crate is
very close to `walkdir` in performance.

This crate's parallelism happens at `fs::read_dir` granularity. If you are
walking many files in a single directory it won't help. On the other hand if
you are walking a hierarchy with many folders and many files then it can
help a lot.

Also note that even though the `ignore` crate has similar performance to
this crate is has much worse latency when you want sorted results. This
crate will start streaming sorted results right away, while with `ignore`
you'll need to wait until the entire walk finishes before you can sort and
start processing the results in sorted order.

| Crate | Options | Time |
|---------|--------------------------------|-----------|
| jwalk | unsorted, parallel | 60.811 ms |
| jwalk | sorted, parallel | 61.445 ms |
| jwalk | sorted, parallel, metadata | 100.95 ms |
| jwalk | unsorted, parallel (2 threads) | 99.998 ms |
| jwalk | unsorted, serial | 168.68 ms |
| jwalk | sorted, parallel, first 100 | 9.9794 ms |
| ignore | unsorted, parallel | 74.251 ms |
| ignore | sorted, parallel | 99.336 ms |
| ignore | sorted, parallel, metadata | 134.26 ms |
| walkdir | unsorted | 162.09 ms |
| walkdir | sorted | 200.09 ms |
| walkdir | sorted, metadata | 422.74 ms |

# Why wouldn't you use this crate?

Directory traversal is already pretty fast with existing more popular
crates. `walkdir` in particular is very good if you need a strait forward
single threaded solution.

This crate processes each `fs::read_dir` as a single unit. Reading all
entries and converting them into its own `DirEntry` representation. This
representation is fairly lightweight, but if you have an extremely wide or
deep directory structure it might cause problems holding too many
`DirEntry`s in memory at once. The concern here is memory, not open file
descriptors. This crate only keeps one open file descriptor per rayon
thread.
12 changes: 9 additions & 3 deletions src/core/iterators.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,17 @@ impl Iterator for ReadDirIter {
///
/// Flattens a ReadDirIter into an iterator over individual Result<DirEntry>.
pub struct DirEntryIter {
read_dir_iter: Peekable<ReadDirIter>,
read_dir_iter_stack: Vec<vec::IntoIter<Result<DirEntry>>>,
read_dir_iter: Peekable<ReadDirIter>,
root_entry_result: Option<Result<DirEntry>>,
}

impl DirEntryIter {
pub fn new(read_dir_iter: ReadDirIter) -> DirEntryIter {
pub fn new(read_dir_iter: ReadDirIter, root_entry_result: Result<DirEntry>) -> DirEntryIter {
DirEntryIter {
read_dir_iter: read_dir_iter.peekable(),
read_dir_iter_stack: Vec::new(),
root_entry_result: Some(root_entry_result),
}
}

Expand All @@ -81,6 +83,10 @@ impl DirEntryIter {
impl Iterator for DirEntryIter {
type Item = Result<DirEntry>;
fn next(&mut self) -> Option<Self::Item> {
if let Some(root_entry_result) = self.root_entry_result.take() {
return Some(root_entry_result);
}

loop {
if self.read_dir_iter_stack.is_empty() {
if self.read_dir_iter.peek().is_some() {
Expand All @@ -98,7 +104,7 @@ impl Iterator for DirEntryIter {
Err(err) => return Some(Err(err)),
};

if dir_entry.children_spec().is_some() {
if dir_entry.expects_children() {
dir_entry.set_children_error(self.push_next_read_dir_iter());
}

Expand Down
Loading

0 comments on commit 5fc2859

Please sign in to comment.