`df`: Adds support for mount path prefix matching and input path #3161

crazystylus · 2022-02-19T11:38:38Z

Adds support for mount path prefix matching and input path canonicalization.
Fixes #3065

Sorts mount paths in the order of decreasing string length
Canonicalize all paths and clear invalid paths
Checking of mount path prefix matches input path

canonicalization - Sorts mount paths in reverse lexicographical order - Canonicalize all paths and clear invalid paths - Checking of mount path prefix matches input path

jfinkels · 2022-02-19T15:19:41Z

src/uu/df/src/df.rs

+    let mut mounts = read_fs_list();
+
+    // Sort mounts in desc ordered lexicographically
+    mounts.sort_by(|a, b| b.mount_dir.cmp(&a.mount_dir));


Why sort here? On my system (Ubuntu), the output of df is not lexicographically ordered:

$ df --output=source | tail -n +2 | head -n4 udev tmpfs /dev/mapper/vgubuntu-root tmpfs

See also my comment here: #3086 (comment)

My plan is to sort in the order of decreasing mount path length, so that when I do a prefix matching, the longest matching mount path is selected first and rest mount paths are ignored. For example if we have mount paths [\,\root] and we have input /root/docs then we only need to output /root and / needs to be skipped. Sorting helps to assure this part.

Sorting in reverse/desc lexicographical order also works here as longest path will come first, but let me change to simple sort by length.

crazystylus · 2022-02-19T18:20:15Z

Made changes to sort mount_paths in order of decreasing length instead of reverse lexicographical order.
Added comments for explaining sorting and path prefix matching.

…tring length. Added comments for explaining sorting and path prefix matching

jfinkels

This will cause a regression in df as it is today, due to the sorting. Unfortunately there is no unit test to cover this case, but the output of df should match the order of files in /etc/mtab, as I mentioned in this comment: #3086 (comment)

Also, the various mutations of paths and the use of the empty string as a way of filtering out paths are tough to follow. Is it possible to move some of this logic into the mount_info_lt() function, which returns true exactly when one MountInfo object should be kept and the other dropped?

crazystylus · 2022-02-21T15:59:12Z

This will cause a regression in df as it is today, due to the sorting. Unfortunately there is no unit test to cover this case, but the output of df should match the order of files in /etc/mtab, as I mentioned in this comment: #3086 (comment)

Also, the various mutations of paths and the use of the empty string as a way of filtering out paths are tough to follow. Is it possible to move some of this logic into the mount_info_lt() function, which returns true exactly when one MountInfo object should be kept and the other dropped?

It's not possible to move logic into the mount_info_lt() function as it will complicate things.
Can try running this and compare the output?
df /boot / /boot/efi and df /boot/efi /boot /
I think df will order output according to input paths. If no path list is supplied, then only it will in the same order as /etc/mtab

1. Removed sorting of mount paths 2. Implemented prefix matching using iterators 3. Removed un-needed mut from previous commit

crazystylus · 2022-02-21T18:16:18Z

I have made changes as below for cleanup of code and meeting requirements

Removed sorting of mount_paths
Moved the canonicalization logic so that no mutability is required
Moved path matching logic from is_included() to filter_mount_list()
Changed loops to iterators to cleanup code where ever possible
For each input_path, code selects longest mount_path and de-duplication is handled by is_best()
Removed string clear and mutability

jfinkels · 2022-02-21T18:46:20Z

Can try running this and compare the output?
df /boot / /boot/efi and df /boot/efi /boot /
I think df will order output according to input paths. If no path list is supplied, then only it will in the same order as /etc/mtab

Oh you are exactly right:

$ df --output=target /boot / /boot/efi
Mounted on
/boot
/
/boot/efi
$ df --output=target / /boot /boot/efi
Mounted on
/
/boot
/boot/efi

Sorry for my misunderstanding earlier. In that case, can you add a test for this in tests/by-util/test_df.rs?

my misunderstanding

crazystylus · 2022-02-24T16:30:15Z

I have verified that df from main branch is falsely passing test: df/df-symlink.sh by giving no output because this test case passes . as input.

I have confirmed, this test case will only start passing once --output=target is implemented.

crazystylus · 2022-02-24T16:37:26Z

Hi @jfinkels ,
I realized there is one more issue. We should be selecting best mount point per input_path, and if the input path is repeated n times, output should also contain n entries.
Try this to verify

$ df --output=target /boot /boot /boot
Mounted on
/boot
/boot
/boot

1. First all input paths are canonicalized 2. If no valid input_paths remain, print filtered_mount_list 3. Else print mount entry for each valid input path

crazystylus · 2022-02-24T19:40:32Z

I have checked GNU coreutils df code, and adjusted the helper functions according to it.

Mount list is only filtered if there are no input_paths and all entries are emitted
Else each input_path is canonicalized and then mapped with a mount_entry from unfiltered_list

Ref: https://github.com/coreutils/coreutils/blob/master/src/df.c#L1830-L1840

Both test cases have been added

1. Fixed Clippy:needless_late_init 2. Added testcase: check if mount_points are printed in the order of input_paths 3. Added testcase: check if input_path is repeased, is the mount point also repeased

jfinkels · 2022-02-25T02:37:44Z

I have checked GNU coreutils df code, and adjusted the helper functions according to it.

It's best not to view the source code of GNU coreutils in order to avoid infringing their copyright.

1. Mount list is only filtered if there are no input_paths and all entries are emitted
2. Else each input_path is canonicalized and then mapped with a mount_entry from unfiltered_list

There have been a lot of changes in this pull request and it's hard for me to interpret. Could you give concrete examples of how the behavior of df is changing after these two changes?

Both test cases have been added

I only see one new test case, asserting that if two files are specified in the input then two identical rows appear in the output table.

crazystylus · 2022-02-25T15:22:25Z

I have checked GNU coreutils df code, and adjusted the helper functions according to it.

It's best not to view the source code of GNU coreutils in order to avoid infringing their copyright.
1. Mount list is only filtered if there are no input_paths and all entries are emitted
2. Else each input_path is canonicalized and then mapped with a mount_entry from unfiltered_list
There have been a lot of changes in this pull request and it's hard for me to interpret. Could you give concrete examples of how the behavior of df is changing after these two changes?

Both test cases have been added

I only see one new test case, asserting that if two files are specified in the input then two identical rows appear in the output table.

I had to remove the 2nd test case as it will never pass in the CI/CD Pipeline because only one mount point / is available for testing.

The following are the changes made by this PR:

All input paths are canonicalized, so relative paths and symlinks will start working now.
E.g. df . df ../ df symlink will work successfully
Mount points are matched to input_path by prefix matching and longest one is selected
E.g. df /root/....../long_path/.... will point to correct mount_point
Mount points are printed in the order of input paths
E.g. df / /boot /boot/efi df /boot/efi /boot / will have different output
If input path is repeated, it's subsequent matched entry is also repeated
E.g. df / / / will print 3 entries
If no paths are passed as arguments to df then, filtered mount list is printed preserving the mtab order
E.g. df
GNU Test: df/df-symlink.sh is evaluated properly. Main branch code falsely passes by printing no output.
df will start passing GNU Test: df/df-symlink.sh when this PR and df: implement the --output command-line argument #3176 are both merged to main branch

jfinkels

Need to add unknown words to the spell-check:ignore line at the head of the file.

It's hard to say if anything is missing here without tests (which are understandably difficult to create); in the future pull request #3167 demonstrates how we can at least test that these internal helper functions are consistent.

jfinkels · 2022-02-26T18:05:32Z

src/uu/df/src/df.rs

            result.push(mi);
        }
    }
    result
 }

+/// Assign 1 `MountInfo` entry to each path
+/// `lofs` enries are skipped and dummy mount points are skipped


What does "lofs" mean?

Here lofs is for Solaris style loopback filesystem which is not present in linux. It's present in Solaris and FreeBSD and is similar to symlink.

…oreutils into df-failed-to-print-fs-info

crazystylus · 2022-02-27T07:09:48Z

I had to remove other test case also as it's only working in Linux and fails in Windows. Hence I think unit test cases is the best approach.

tertsdiepraam · 2022-02-27T09:14:56Z

Is that test supposed to only work on unix? Because if so, you could just put #[cfg(unix)] on it. It'd also be great if you added the explanation about lofs as a comment in the code.

crazystylus · 2022-02-28T17:23:56Z

For some reason dd is failing in windows, but lofs explanation has been added as comments and have added #[cfg(unix)] for the testcase.

crazystylus · 2022-03-11T01:23:37Z

Please let me know if any other changes are required in this PR for getting it merged.

crazystylus added 2 commits February 19, 2022 16:59

Adds support for mount path prefix matching and input path

99793f1

canonicalization - Sorts mount paths in reverse lexicographical order - Canonicalize all paths and clear invalid paths - Checking of mount path prefix matches input path

Clippy: use of unwrap_or followed by a call to new fixed

5b79c80

jfinkels reviewed Feb 19, 2022

View reviewed changes

Changed sorting from reverse lexicographical to decreasing order of s…

90ee693

…tring length. Added comments for explaining sorting and path prefix matching

crazystylus force-pushed the df-failed-to-print-fs-info branch from 6a549ee to 90ee693 Compare February 19, 2022 18:24

jfinkels previously requested changes Feb 19, 2022

View reviewed changes

sylvestre added the awaiting response label Feb 20, 2022

Removed sorting of mount_paths and optimized prefix matching.

e70c2af

1. Removed sorting of mount paths 2. Implemented prefix matching using iterators 3. Removed un-needed mut from previous commit

crazystylus requested a review from jfinkels February 21, 2022 18:02

Merge branch 'uutils:main' into df-failed-to-print-fs-info

e66a0e5

Fixed the program flow as per GNU df implementation

a74f74b

1. First all input paths are canonicalized 2. If no valid input_paths remain, print filtered_mount_list 3. Else print mount entry for each valid input path

crazystylus force-pushed the df-failed-to-print-fs-info branch from f32ae26 to a74f74b Compare February 24, 2022 19:40

crazystylus added 3 commits February 25, 2022 01:37

Fixed Clippy:needless_late_init and added 2 test-cases

b64c043

1. Fixed Clippy:needless_late_init 2. Added testcase: check if mount_points are printed in the order of input_paths 3. Added testcase: check if input_path is repeased, is the mount point also repeased

Removed inp_path order test case as it fails in CI due to missing /boot

09d607a

Merge branch 'uutils:main' into df-failed-to-print-fs-info

80bd35f

jfinkels approved these changes Feb 26, 2022

View reviewed changes

crazystylus added 2 commits February 27, 2022 00:28

spell-check:ignore lofs

f35164d

Merge branch 'df-failed-to-print-fs-info' of github.com:crazystylus/c…

f9aa870

…oreutils into df-failed-to-print-fs-info

df: cleanup and refactor

52a8e2c

crazystylus force-pushed the df-failed-to-print-fs-info branch from b92fa2c to 52a8e2c Compare February 27, 2022 09:37

jfinkels mentioned this pull request Mar 2, 2022

df statx #3203

Open

sylvestre and others added 3 commits March 4, 2022 20:36

Merge branch 'main' into df-failed-to-print-fs-info

0d2209e

Merge branch 'main' into df-failed-to-print-fs-info

100f8bb

Merge branch 'main' into df-failed-to-print-fs-info

a202982

sylvestre merged commit 5c5f4ca into uutils:main Mar 11, 2022

crazystylus deleted the df-failed-to-print-fs-info branch March 12, 2022 04:25

jfinkels mentioned this pull request Mar 27, 2022

GNU tests factor regressions #3171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`df`: Adds support for mount path prefix matching and input path #3161

`df`: Adds support for mount path prefix matching and input path #3161

crazystylus commented Feb 19, 2022 •

edited

Loading

jfinkels Feb 19, 2022

crazystylus Feb 19, 2022 •

edited

Loading

crazystylus commented Feb 19, 2022 •

edited

Loading

jfinkels left a comment

crazystylus commented Feb 21, 2022 •

edited

Loading

crazystylus commented Feb 21, 2022 •

edited

Loading

jfinkels commented Feb 21, 2022

crazystylus commented Feb 24, 2022 •

edited

Loading

crazystylus commented Feb 24, 2022

crazystylus commented Feb 24, 2022 •

edited

Loading

jfinkels commented Feb 25, 2022

crazystylus commented Feb 25, 2022 •

edited

Loading

jfinkels left a comment

jfinkels Feb 26, 2022

crazystylus Feb 26, 2022

crazystylus commented Feb 27, 2022

tertsdiepraam commented Feb 27, 2022

crazystylus commented Feb 28, 2022

crazystylus commented Mar 11, 2022

df: Adds support for mount path prefix matching and input path #3161

df: Adds support for mount path prefix matching and input path #3161

Conversation

crazystylus commented Feb 19, 2022 • edited Loading

jfinkels Feb 19, 2022

Choose a reason for hiding this comment

crazystylus Feb 19, 2022 • edited Loading

Choose a reason for hiding this comment

crazystylus commented Feb 19, 2022 • edited Loading

jfinkels left a comment

Choose a reason for hiding this comment

crazystylus commented Feb 21, 2022 • edited Loading

crazystylus commented Feb 21, 2022 • edited Loading

jfinkels commented Feb 21, 2022

crazystylus commented Feb 24, 2022 • edited Loading

crazystylus commented Feb 24, 2022

crazystylus commented Feb 24, 2022 • edited Loading

jfinkels commented Feb 25, 2022

crazystylus commented Feb 25, 2022 • edited Loading

jfinkels left a comment

Choose a reason for hiding this comment

jfinkels Feb 26, 2022

Choose a reason for hiding this comment

crazystylus Feb 26, 2022

Choose a reason for hiding this comment

crazystylus commented Feb 27, 2022

tertsdiepraam commented Feb 27, 2022

crazystylus commented Feb 28, 2022

crazystylus commented Mar 11, 2022

`df`: Adds support for mount path prefix matching and input path #3161

`df`: Adds support for mount path prefix matching and input path #3161

crazystylus commented Feb 19, 2022 •

edited

Loading

crazystylus Feb 19, 2022 •

edited

Loading

crazystylus commented Feb 19, 2022 •

edited

Loading

crazystylus commented Feb 21, 2022 •

edited

Loading

crazystylus commented Feb 21, 2022 •

edited

Loading

crazystylus commented Feb 24, 2022 •

edited

Loading

crazystylus commented Feb 24, 2022 •

edited

Loading

crazystylus commented Feb 25, 2022 •

edited

Loading