Improve docs for flake8-use-pathlib rules (#14741)

Flag the perf impact more clearly, add more links, clarify the rule about the glob module
astral-sh · Dec 3, 2024 · 70bd106 · 70bd106
1 parent bf0fd04
commit 70bd106
Show file tree

Hide file tree

Showing 7 changed files with 214 additions and 121 deletions.
diff --git a/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/glob_rule.rs b/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/glob_rule.rs
@@ -2,7 +2,7 @@ use ruff_diagnostics::Violation;
 use ruff_macros::{derive_message_formats, ViolationMetadata};
 
 /// ## What it does
-/// Checks for the use of `glob` and `iglob`.
+/// Checks for the use of `glob.glob()` and `glob.iglob()`.
 ///
 /// ## Why is this bad?
 /// `pathlib` offers a high-level API for path manipulation, as compared to
@@ -12,35 +12,43 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 /// improve readability over their low-level counterparts (e.g.,
 /// `glob.glob()`).
 ///
-/// Note that `glob.glob` and `Path.glob` are not exact equivalents:
+/// Note that `glob.glob()` and `Path.glob()` are not exact equivalents:
 ///
-/// |                   | `glob`                                                                                                                                                             | `Path.glob`                                                                                                                                |
-/// |-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
-/// | Hidden files      | Excludes hidden files by default. From Python 3.11 onwards, the `include_hidden` keyword can be used to include hidden directories.                                   | Includes hidden files by default.                                                                                                          |
-/// | Iterator          | `iglob` returns an iterator. Under the hood, `glob` simply converts the iterator to a list.                                                                        | `Path.glob` returns an iterator.                                                                                                           |
-/// | Working directory | `glob` takes a `root_dir` keyword to set the current working directory.                                                                                            | `Path.rglob` can be used to return the relative path.                                                                                      |
-/// | Globstar (`**`)   | `glob` requires the `recursive` flag to be set to `True` for the `**` pattern to match any files and zero or more directories, subdirectories, and symbolic links. | The `**` pattern in `Path.glob` means "this directory and all subdirectories, recursively". In other words, it enables recursive globbing. |
+/// |                   | `glob`-module functions                                                                                                                              | `Path.glob()`                                                                                                                                |
+/// |-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
+/// | Hidden files      | Hidden files are excluded by default. On Python 3.11+, the `include_hidden` keyword can be used to include hidden directories.                       | Includes hidden files by default.                                                                                                            |
+/// | Eagerness         | `glob.iglob()` returns a lazy iterator. Under the hood, `glob.glob()` simply converts the iterator to a list.                                        | `Path.glob()` returns a lazy iterator.                                                                                                       |
+/// | Working directory | `glob.glob()` and `glob.iglob()` take a `root_dir` keyword to set the current working directory.                                                     | `Path.rglob()` can be used to return the relative path.                                                                                      |
+/// | Globstar (`**`)   | The `recursive` flag must be set to `True` for the `**` pattern to match any files and zero or more directories, subdirectories, and symbolic links. | The `**` pattern in `Path.glob()` means "this directory and all subdirectories, recursively". In other words, it enables recursive globbing. |
 ///
 /// ## Example
 /// ```python
 /// import glob
 /// import os
 ///
-/// glob.glob(os.path.join(path, "requirements*.txt"))
+/// glob.glob(os.path.join("my_path", "requirements*.txt"))
 /// ```
 ///
 /// Use instead:
 /// ```python
 /// from pathlib import Path
 ///
-/// Path(path).glob("requirements*.txt")
+/// Path("my_path").glob("requirements*.txt")
 /// ```
 ///
+/// ## Known issues
+/// While using `pathlib` can improve the readability and type safety of your code,
+/// it can be less performant than the lower-level alternatives that work directly with strings,
+/// especially on older versions of Python.
+///
 /// ## References
 /// - [Python documentation: `Path.glob`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob)
 /// - [Python documentation: `Path.rglob`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob)
 /// - [Python documentation: `glob.glob`](https://docs.python.org/3/library/glob.html#glob.glob)
 /// - [Python documentation: `glob.iglob`](https://docs.python.org/3/library/glob.html#glob.iglob)
+/// - [PEP 428 – The pathlib module – object-oriented filesystem paths](https://peps.python.org/pep-0428/)
+/// - [Why you should be using pathlib](https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/)
+/// - [No really, pathlib is great](https://treyhunner.com/2019/01/no-really-pathlib-is-great/)
 #[derive(ViolationMetadata)]
 pub(crate) struct Glob {
     pub function: String,

diff --git a/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getatime.rs b/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getatime.rs
@@ -6,15 +6,12 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 ///
 /// ## Why is this bad?
 /// `pathlib` offers a high-level API for path manipulation, as compared to
-/// the lower-level API offered by `os`.
+/// the lower-level API offered by `os.path`.
 ///
 /// When possible, using `Path` object methods such as `Path.stat()` can
-/// improve readability over the `os` module's counterparts (e.g.,
+/// improve readability over the `os.path` module's counterparts (e.g.,
 /// `os.path.getatime()`).
 ///
-/// Note that `os` functions may be preferable if performance is a concern,
-/// e.g., in hot loops.
-///
 /// ## Examples
 /// ```python
 /// import os
@@ -29,6 +26,11 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 /// Path(__file__).stat().st_atime
 /// ```
 ///
+/// ## Known issues
+/// While using `pathlib` can improve the readability and type safety of your code,
+/// it can be less performant than the lower-level alternatives that work directly with strings,
+/// especially on older versions of Python.
+///
 /// ## References
 /// - [Python documentation: `Path.stat`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat)
 /// - [Python documentation: `os.path.getatime`](https://docs.python.org/3/library/os.path.html#os.path.getatime)

diff --git a/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getctime.rs b/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getctime.rs
@@ -6,15 +6,12 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 ///
 /// ## Why is this bad?
 /// `pathlib` offers a high-level API for path manipulation, as compared to
-/// the lower-level API offered by `os`.
+/// the lower-level API offered by `os.path`.
 ///
 /// When possible, using `Path` object methods such as `Path.stat()` can
-/// improve readability over the `os` module's counterparts (e.g.,
+/// improve readability over the `os.path` module's counterparts (e.g.,
 /// `os.path.getctime()`).
 ///
-/// Note that `os` functions may be preferable if performance is a concern,
-/// e.g., in hot loops.
-///
 /// ## Examples
 /// ```python
 /// import os
@@ -29,6 +26,11 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 /// Path(__file__).stat().st_ctime
 /// ```
 ///
+/// ## Known issues
+/// While using `pathlib` can improve the readability and type safety of your code,
+/// it can be less performant than the lower-level alternatives that work directly with strings,
+/// especially on older versions of Python.
+///
 /// ## References
 /// - [Python documentation: `Path.stat`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat)
 /// - [Python documentation: `os.path.getctime`](https://docs.python.org/3/library/os.path.html#os.path.getctime)

diff --git a/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getmtime.rs b/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getmtime.rs
@@ -6,15 +6,12 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 ///
 /// ## Why is this bad?
 /// `pathlib` offers a high-level API for path manipulation, as compared to
-/// the lower-level API offered by `os`.
+/// the lower-level API offered by `os.path`.
 ///
 /// When possible, using `Path` object methods such as `Path.stat()` can
-/// improve readability over the `os` module's counterparts (e.g.,
+/// improve readability over the `os.path` module's counterparts (e.g.,
 /// `os.path.getmtime()`).
 ///
-/// Note that `os` functions may be preferable if performance is a concern,
-/// e.g., in hot loops.
-///
 /// ## Examples
 /// ```python
 /// import os
@@ -29,6 +26,11 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 /// Path(__file__).stat().st_mtime
 /// ```
 ///
+/// ## Known issues
+/// While using `pathlib` can improve the readability and type safety of your code,
+/// it can be less performant than the lower-level alternatives that work directly with strings,
+/// especially on older versions of Python.
+///
 /// ## References
 /// - [Python documentation: `Path.stat`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat)
 /// - [Python documentation: `os.path.getmtime`](https://docs.python.org/3/library/os.path.html#os.path.getmtime)

diff --git a/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getsize.rs b/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_path_getsize.rs
@@ -6,15 +6,12 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 ///
 /// ## Why is this bad?
 /// `pathlib` offers a high-level API for path manipulation, as compared to
-/// the lower-level API offered by `os`.
+/// the lower-level API offered by `os.path`.
 ///
 /// When possible, using `Path` object methods such as `Path.stat()` can
-/// improve readability over the `os` module's counterparts (e.g.,
+/// improve readability over the `os.path` module's counterparts (e.g.,
 /// `os.path.getsize()`).
 ///
-/// Note that `os` functions may be preferable if performance is a concern,
-/// e.g., in hot loops.
-///
 /// ## Examples
 /// ```python
 /// import os
@@ -29,6 +26,11 @@ use ruff_macros::{derive_message_formats, ViolationMetadata};
 /// Path(__file__).stat().st_size
 /// ```
 ///
+/// ## Known issues
+/// While using `pathlib` can improve the readability and type safety of your code,
+/// it can be less performant than the lower-level alternatives that work directly with strings,
+/// especially on older versions of Python.
+///
 /// ## References
 /// - [Python documentation: `Path.stat`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat)
 /// - [Python documentation: `os.path.getsize`](https://docs.python.org/3/library/os.path.html#os.path.getsize)

diff --git a/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_sep_split.rs b/crates/ruff_linter/src/rules/flake8_use_pathlib/rules/os_sep_split.rs
@@ -42,6 +42,16 @@ use crate::checkers::ast::Checker;
 /// if any(part in blocklist for part in Path("my/file/path").parts):
 ///     ...
 /// ```
+///
+/// ## Known issues
+/// While using `pathlib` can improve the readability and type safety of your code,
+/// it can be less performant than working directly with strings,
+/// especially on older versions of Python.
+///
+/// ## References
+/// - [PEP 428 – The pathlib module – object-oriented filesystem paths](https://peps.python.org/pep-0428/)
+/// - [Why you should be using pathlib](https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/)
+/// - [No really, pathlib is great](https://treyhunner.com/2019/01/no-really-pathlib-is-great/)
 #[derive(ViolationMetadata)]
 pub(crate) struct OsSepSplit;