README.md includes SkipThis example near top

karrick · Aug 14, 2020 · f64d915 · f64d915
1 parent 9ffb4b8
commit f64d915
Showing 1 changed file with 129 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -26,6 +26,12 @@ provided callback function.
     dirname := "some/directory/root"
     err := godirwalk.Walk(dirname, &godirwalk.Options{
         Callback: func(osPathname string, de *godirwalk.Dirent) error {
+            // Following string operation is not most performant way
+            // of doing this, but common enough to warrant a simple
+            // example here:
+            if strings.Contains(osPathname, ".git") {
+                return godirwalk.SkipThis
+            }
             fmt.Printf("%s %s\n", de.ModeType(), osPathname)
             return nil
         },
@@ -47,8 +53,8 @@ Here's why I use `godirwalk` in preference to `filepath.Walk`,
 
 When compared against `filepath.Walk` in benchmarks, it has been
 observed to run between five and ten times the speed on darwin, at
-speeds comparable to the that of the unix `find` utility; about twice
-the speed on linux; and about four times the speed on Windows.
+speeds comparable to the that of the unix `find` utility; and about
+twice the speed on linux; and about four times the speed on Windows.
 
 How does it obtain this performance boost? It does less work to give
 you nearly the same output. This library calls the same `syscall`
@@ -60,11 +66,11 @@ file system entry data from the operating system.
 
 While traversing a file system directory tree, `filepath.Walk` obtains
 the list of immediate descendants of a directory, and throws away the
-file system node type information provided by the operating system
-that comes with the node's name. Then, immediately prior to invoking
-the callback function, `filepath.Walk` invokes `os.Stat` for each
-node, and passes the returned `os.FileInfo` information to the
-callback.
+node type information for the file system entry that is provided by
+the operating system that comes with the node's name. Then,
+immediately prior to invoking the callback function, `filepath.Walk`
+invokes `os.Stat` for each node, and passes the returned `os.FileInfo`
+information to the callback.
 
 While the `os.FileInfo` information provided by `os.Stat` is extremely
 helpful--and even includes the `os.FileMode` data--providing it
@@ -141,37 +147,126 @@ The takeaway is that behavior is different based on which platform
 until it is fixed in the standard library, it presents a compatibility
 problem.
 
-This library correctly identifies symbolic links that point to
-directories and will only follow them when `FollowSymbolicLinks` is
-set to true. Behavior on Windows and other operating systems is
-identical.
+This library fixes the above problem such that it will never follow
+logical file sytem loops on either unix or Windows. Furthermore, it
+will only follow symbolic links when `FollowSymbolicLinks` is set to
+true. Behavior on Windows and other operating systems is identical.
 
 ### It's more easy to use than `filepath.Walk`
 
+While this library strives to mimic the behavior of the incredibly
+well-written `filepath.Walk` standard library, there are places where
+it deviates a bit in order to provide a more easy or intuitive caller
+interface.
+
+#### Callback interface does not send you an error to check
+
 Since this library does not invoke `os.Stat` on every file system node
 it encounters, there is no possible error event for the callback
 function to filter on. The third argument in the `filepath.WalkFunc`
 function signature to pass the error from `os.Stat` to the callback
 function is no longer necessary, and thus eliminated from signature of
 the callback function from this library.
 
-Also, `filepath.Walk` invokes the callback function with a solidus
-delimited pathname regardless of the os-specific path separator. This
-library invokes the callback function with the os-specific pathname
-separator, obviating a call to `filepath.Clean` in the callback
-function for each node prior to actually using the provided pathname.
+Furthermore, this slight interface difference between
+`filepath.WalkFunc` and this library's `WalkFunc` eliminates the
+boilerplate code that callback handlers must write when they use
+`filepath.Walk`. Rather than every callback function needing to check
+the error value passed into it and branch accordingly, users of this
+library do not even have an error value to check immediately upon
+entry into the callback function. This is an improvement both in
+runtime performance and code clarity.
+
+#### Callback function is invoked with OS specific file system path separator
+
+On every OS platform `filepath.Walk` invokes the callback function
+with a solidus (`/`) delimited pathname. By contrast this library
+invokes the callback with the os-specific pathname separator,
+obviating a call to `filepath.Clean` in the callback function for each
+node prior to actually using the provided pathname.
 
 In other words, even on Windows, `filepath.Walk` will invoke the
 callback with `some/path/to/foo.txt`, requiring well written clients
 to perform pathname normalization for every file prior to working with
-the specified file. In truth, many clients developed on unix and not
-tested on Windows neglect this subtlety, and will result in software
-bugs when running on Windows. This library would invoke the callback
-function with `some\path\to\foo.txt` for the same file when running on
-Windows, eliminating the need to normalize the pathname by the client,
-and lessen the likelyhood that a client will work on unix but not on
+the specified file. This is a hidden boilerplate requirement to create
+truly os agnostic callback functions. In truth, many clients developed
+on unix and not tested on Windows neglect this subtlety, and will
+result in software bugs when someone tries to run that software on
 Windows.
 
+This library invokes the callback function with `some\path\to\foo.txt`
+for the same file when running on Windows, eliminating the need to
+normalize the pathname by the client, and lessen the likelyhood that a
+client will work on unix but not on Windows.
+
+This enhancement eliminates necessity for some more boilerplate code
+in callback functions while improving the runtime performance of this
+library.
+
+#### `godirwalk.SkipThis` is more intuitive to use than `filepath.SkipDir`
+
+One arguably confusing aspect of the `filepath.WalkFunc` interface
+that this library must emulate is how a caller tells the `Walk`
+function to skip file system entries. With both `filepath.Walk` and
+this library's `Walk`, when a callback function wants to skip a
+directory and not descend into its children, it returns
+`filepath.SkipDir`. If the callback function returns
+`filepath.SkipDir` for a non-directory, `filepath.Walk` and this
+library will stop processing any more entries in the current
+directory. This is not necessarily what most developers want or
+expect. If you want to simply skip a particular non-directory entry
+but continue processing entries in the directory, the callback
+function must return nil.
+
+The implications of this interface design is when you want to walk a
+file system hierarchy and skip an entry, you have to return a
+different value based on what type of file system entry that node
+is. To skip an entry, if the entry is a directory, you must return
+`filepath.SkipDir`, and if entry is not a directory, you must return
+`nil`. This is an unfortunate hurdle I have observed many developers
+struggling with, simply because it is not an intuitive interface.
+
+Here is an example callback function that adheres to
+`filepath.WalkFunc` interface to have it skip any file system entry
+whose full pathname includes a particular substring, `optSkip`. Note
+that this library still supports identical behavior of `filepath.Walk`
+when the callback function returns `filepath.SkipDir`.
+
+```Go
+    func callback1(osPathname string, de *godirwalk.Dirent) error {
+        if optSkip != "" && strings.Contains(osPathname, optSkip) {
+            if b, err := de.IsDirOrSymlinkToDir(); b == true && err == nil {
+                return filepath.SkipDir
+            }
+            return nil
+        }
+        // Process file like normal...
+        return nil
+    }
+```
+
+This library attempts to eliminate some of that logic boilerplate
+required in callback functions by providing a new token error value,
+`SkipThis`, which a callback function may return to skip the current
+file system entry regardless of what type of entry it is. If the
+current entry is a directory, its children will not be enumerated,
+exactly as if the callback had returned `filepath.SkipDir`. If the
+current entry is a non-directory, the next file system entry in the
+current directory will be enumerated, exactly as if the callback
+returned `nil`. The following example callback function has identical
+behavior as the previous, but has less boilerplate, and admittedly
+logic that I find more simple to follow.
+
+```Go
+    func callback2(osPathname string, de *godirwalk.Dirent) error {
+        if optSkip != "" && strings.Contains(osPathname, optSkip) {
+            return godirwalk.SkipThis
+        }
+        // Process file like normal...
+        return nil
+    }
+```
+
 ### It's more flexible than `filepath.Walk`
 
 #### Configurable Handling of Symbolic Links
@@ -182,7 +277,7 @@ does. However, it does invoke the callback function with each node it
 finds, including symbolic links. If a particular use case exists to
 follow symbolic links when traversing a directory tree, this library
 can be invoked in manner to do so, by setting the
-`FollowSymbolicLinks` parameter to true.
+`FollowSymbolicLinks` config parameter to `true`.
 
 #### Configurable Sorting of Directory Children
 
@@ -191,11 +286,12 @@ descendants of a directory prior to visiting each node, just like
 `filepath.Walk` does. This is usually the desired behavior. However,
 this does come at slight performance and memory penalties required to
 sort the names when a directory node has many entries. Additionally if
-caller specifies `Unsorted` enumeration, reading directories is lazily
-performed as the caller consumes entries. If a particular use case
-exists that does not require sorting the directory's immediate
-descendants prior to visiting its nodes, this library will skip the
-sorting step when the `Unsorted` parameter is set to true.
+caller specifies `Unsorted` enumeration in the configuration
+parameter, reading directories is lazily performed as the caller
+consumes entries. If a particular use case exists that does not
+require sorting the directory's immediate descendants prior to
+visiting its nodes, this library will skip the sorting step when the
+`Unsorted` parameter is set to `true`.
 
 Here's an interesting read of the potential hazzards of traversing a
 file system hierarchy in a non-deterministic order. If you know the
@@ -208,10 +304,11 @@ setting this option.
 #### Configurable Post Children Callback
 
 This library provides upstream code with the ability to specify a
-callback to be invoked for each directory after its children are
-processed. This has been used to recursively delete empty directories
-after traversing the file system in a more efficient manner. See the
-`examples/clean-empties` directory for an example of this usage.
+callback function to be invoked for each directory after its children
+are processed. This has been used to recursively delete empty
+directories after traversing the file system in a more efficient
+manner. See the `examples/clean-empties` directory for an example of
+this usage.
 
 #### Configurable Error Callback