Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncoid: Fix multiple regressions caused by #818 #931

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Deltik
Copy link
Contributor

@Deltik Deltik commented Jun 13, 2024

Motivation and Context

#818 only added a functional test for the new feature of sorting snapshots by the createtxg property, so it and the existing tests did not catch the following regressions:

Slower listing of snapshots' properties

From @phreaker0:

i just noticed a huge performance regression with this PR. Because now all properties are fetched instead of only the needed ones ( all vs guid,creation), I get why because if specifying createtxg it will fail for versions where it isn't supported but I guess we need to check for createtxg support once.

From one of my datasets:

$ time zfs get -Hpd 1 -t snapshot all '...' > /dev/null

real 0m2.642s user 0m0.053s sys 0m2.584s

$ time zfs get -Hpd 1 -t snapshot guid,creation '...' > /dev/null

real 0m0.804s user 0m0.029s sys 0m0.759s

getsnaps fallback fails due to the wrong argument being passed in

From @Deltik:

An oversight during the merging of subroutine getsnapsfallback into getsnaps caused the recursive call to prepend $sshcmd to $rhost again, leading to this error:

# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
… [TRUNCATED] …
WARNING: snapshot listing failed, trying fallback command
DEBUG: getting list of snapshots on rpool/before using ssh      ssh      -S   /tmp/syncoid-root19216812226-1718239169-761069-2198 [email protected]  zfs get -Hpd 1   all ''"'"'rpool/before'"'"'' |...
[email protected]: Command not found.
CRITICAL ERROR: snapshots couldn't be listed for rpool/before (exit code 256) at /usr/sbin/syncoid line 1900.

getsnaps parsing error in fallback mode due to non-snapshots appearing in the list

From @Deltik:

If that error were fixed, the fallback command would still fail because non-snapshot datasets would be included in the output, leading to this parsing error:

# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
… [TRUNCATED] …
WARNING: snapshot listing failed, trying fallback command
DEBUG: getting list of snapshots on rpool/before using ssh      -S   /tmp/syncoid-root19216812226-1718239740-763496-6973 [email protected]  zfs get -Hpd 1 all ''"'"'rpool/before'"'"'' |...
CRITICAL ERROR: Unexpected dataset format in rpool/before       type    filesystem      - at /usr/sbin/syncoid line 1914.

Description

These three regressions are fixed in three separate commits:

perf(syncoid): Regression: Bad performance due to zfs get all

Return to using zfs get guid,creation,createtxg to avoid the performance overhead of zfs get all.

There is a per-host global cache to remember:

  • if zfs get supports the type filter argument, -t snapshot and
  • which ZFS properties are available.

The feature check introduced by subroutine check_zfs_get_features eliminates the need for a getsnaps fallback recursion, as there is no longer any guesswork into what features zfs get supports.

Fixes: #818 (comment)

fix(syncoid): Regression: Fallback recursion $rhost mutated already

An oversight during the merging of subroutine getsnapsfallback into getsnaps (e301b5b1) caused the recursive call to prepend $sshcmd to $rhost again, leading to this error:

# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
… [TRUNCATED] …
WARNING: snapshot listing failed, trying fallback command
DEBUG: getting list of snapshots on rpool/before using ssh      ssh      -S   /tmp/syncoid-root19216812226-1718239169-761069-2198 [email protected]  zfs get -Hpd 1   all ''"'"'rpool/before'"'"'' |...
[email protected]: Command not found.
CRITICAL ERROR: snapshots couldn't be listed for rpool/before (exit code 256) at /usr/sbin/syncoid line 1900.

Fixes: #818 (comment)

fix(syncoid): Regression: zfs get parser not compatible with fallback

In the fallback mode, non-snapshot datasets would be included in the output, leading to this parsing error:

# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
… [TRUNCATED] …
WARNING: snapshot listing failed, trying fallback command
DEBUG: getting list of snapshots on rpool/before using ssh      -S   /tmp/syncoid-root19216812226-1718239740-763496-6973 [email protected]  zfs get -Hpd 1 all ''"'"'rpool/before'"'"'' |...
CRITICAL ERROR: Unexpected dataset format in rpool/before       type    filesystem      - at /usr/sbin/syncoid line 1914.

Fixes: #818 (comment)

How Has This Been Tested?

I haven't managed to figure out an easy way to test syncoid's new reduced feature set detection features in subroutine check_zfs_get_features, so I performed manual acceptance tests with a FreeBSD 8.3 machine, which is preloaded with an old version of ZFS that would trigger what was formerly known as getsnapsfallback because it does not support zfs get -t snapshot.

I created some dummy snapshots:

freebsd# zfs list -rtall
NAME                                                               USED  AVAIL  REFER  MOUNTPOINT
rpool                                                              174K   644M    32K  /rpool
rpool/before                                                        31K   644M    31K  /rpool/before
rpool/before@this-snapshot-should-make-it-into-the-after-dataset      0      -    31K  -
rpool/before@oldest-snapshot                                          0      -    31K  -
rpool/before@another-snapshot-does-not-matter                         0      -    31K  -

And then was able to syncoid them out:

root@workstation:~# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
DEBUG: SSHCMD: ssh     
Warning: Permanently added '192.168.122.26' (RSA) to the list of known hosts.
DEBUG: checking availability of lzop on source...
DEBUG: checking availability of lzop on target...
DEBUG: checking availability of lzop on local machine...
WARNING: lzop not available on source ssh:-S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected] sync will continue without compression.
DEBUG: checking availability of mbuffer on source...
WARNING: mbuffer not available on source ssh:-S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected] - sync will continue without source buffering.
DEBUG: checking availability of mbuffer on target...
DEBUG: checking availability of pv on local machine...
DEBUG: checking availability of zfs resume feature on source...
DEBUG: checking availability of zfs resume feature on target...
WARNING: ZFS resume feature not available on source machine - sync will continue without resume support.
DEBUG: syncing source rpool/before to target syncoid-test-11/after.
DEBUG: getting current value of syncoid:sync on rpool/before...
DEBUG: ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected]  zfs get -H syncoid:sync ''"'"'rpool/before'"'"''
DEBUG: checking to see if syncoid-test-11/after on  is already in zfs receive using  ps -Ao args= ...
DEBUG: checking to see if target filesystem exists using "  zfs get -H name 'syncoid-test-11/after' 2>&1 |"...
DEBUG: Checking `zfs get` features on host "[email protected]"...
DEBUG: Host "[email protected]" has `zfs get -t`?: 0
DEBUG: Host "[email protected]" ZFS properties: guid creation createtxg
DEBUG: getting list of snapshots on rpool/before using ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected]  zfs get -Hpd 1  guid,creation,createtxg,type ''"'"'rpool/before'"'"'' |...
DEBUG: creating sync snapshot using "ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected]  zfs snapshot ''"'"'rpool/before'"'"''@syncoid_workstation.wizard2.net_2024-06-12:23:45:49-GMT-05:00
"...
DEBUG: target syncoid-test-11/after does not exist.  Finding oldest available snapshot on source rpool/before ...
DEBUG: getting estimated transfer size from source -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected] using "ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected]  zfs send  -nvP ''"'"'rpool/before@this-snapshot-should-make-it-into-the-after-dataset'"'"'' 2>&1 |"...
DEBUG: sendsize = 15872
INFO: Sending oldest full snapshot [email protected]:rpool/before@this-snapshot-should-make-it-into-the-after-dataset to new target filesystem syncoid-test-11/after (~ 15 KB):
DEBUG: sync size: ~15 KB
DEBUG: ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected] ' zfs send  '"'"'rpool/before'"'"'@'"'"'this-snapshot-should-make-it-into-the-after-dataset'"'"'' | mbuffer  -q -s 128k -m 16M | pv -p -t -e -r -b -s 15872 |  zfs receive  -F 'syncoid-test-11/after' 2>&1
DEBUG: checking to see if syncoid-test-11/after on  is already in zfs receive using  ps -Ao args= ...
46.6KiB 0:00:00 [3.98MiB/s] [=============================================================================================================================================================================================================================================] 300%            
DEBUG: getting estimated transfer size from source -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected] using "ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected]  zfs send  -nvP -I ''"'"'rpool/before@this-snapshot-should-make-it-into-the-after-dataset'"'"'' ''"'"'rpool/before@syncoid_workstation.wizard2.net_2024-06-12:23:45:49-GMT-05:00'"'"'' 2>&1 |"...
DEBUG: sendsize = 0
INFO: Sending incremental [email protected]:rpool/before@this-snapshot-should-make-it-into-the-after-dataset ... syncoid_workstation.wizard2.net_2024-06-12:23:45:49-GMT-05:00 to syncoid-test-11/after (~ 4 KB):
DEBUG: sync size: ~4 KB
DEBUG: ssh      -S /tmp/syncoid-root19216812226-1718253948-807483-4632 [email protected] ' zfs send  -I '"'"'rpool/before'"'"'@'"'"'this-snapshot-should-make-it-into-the-after-dataset'"'"' '"'"'rpool/before'"'"'@'"'"'syncoid_workstation.wizard2.net_2024-06-12:23:45:49-GMT-05:00'"'"'' | mbuffer  -q -s 128k -m 16M | pv -p -t -e -r -b -s 4096 |  zfs receive  -F 'syncoid-test-11/after' 2>&1
DEBUG: checking to see if syncoid-test-11/after on  is already in zfs receive using  ps -Ao args= ...
2.74KiB 0:00:00 [15.3KiB/s] [================================================================================================================================================================>                                                                            ]  68%            
root@workstation:~# zfs list -rtall syncoid-test-11/after
NAME                                                                              USED  AVAIL  REFER  MOUNTPOINT
syncoid-test-11/after                                                              23K  23.8M    23K  none
syncoid-test-11/after@this-snapshot-should-make-it-into-the-after-dataset           0B      -    23K  -
syncoid-test-11/after@oldest-snapshot                                               0B      -    23K  -
syncoid-test-11/after@another-snapshot-does-not-matter                              0B      -    23K  -
syncoid-test-11/after@syncoid_workstation.wizard2.net_2024-06-12:23:45:49-GMT-05:00     0B      -    23K  -

To test the supported properties detection feature, I added a fake requested property and observed that it was not detected as a supported property:

diff --git a/syncoid b/syncoid
--- a/syncoid	(revision 8a023d6d7bacb78959e4a5dcab0a1cbf68e84d69)
+++ b/syncoid	(date 1718254080862)
@@ -1421,7 +1421,7 @@
 
 	writelog('DEBUG', "Host \"$host\" has `zfs get -t`?: $rhost_zfs_get_features{$rhost}->{supports_type_filter}");
 
-	my @properties_to_check = ('guid', 'creation', 'createtxg');
+	my @properties_to_check = ('guid', 'creation', 'createtxg', 'notarealprop');
 	foreach my $prop (@properties_to_check) {
 		my $check_prop_cmd = "$rhost $mysudocmd $zfscmd get -H $prop ''";
 		open my $fh_p, "$check_prop_cmd 2>&1 |";

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Deltik added 3 commits June 12, 2024 20:02
An oversight during the merging of subroutine `getsnapsfallback` into
`getsnaps`
(jimsalterjrs@e301b5b1)
caused the recursive call to prepend `$sshcmd` to `$rhost` again,
leading to this error:

```shell
# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
… [TRUNCATED] …
WARNING: snapshot listing failed, trying fallback command
DEBUG: getting list of snapshots on rpool/before using ssh      ssh      -S   /tmp/syncoid-root19216812226-1718239169-761069-2198 [email protected]  zfs get -Hpd 1   all ''"'"'rpool/before'"'"'' |...
[email protected]: Command not found.
CRITICAL ERROR: snapshots couldn't be listed for rpool/before (exit code 256) at /usr/sbin/syncoid line 1900.
```

Fixes:
jimsalterjrs#818 (comment)
In the fallback mode, non-snapshot datasets would be included in the
output, leading to this parsing error:

```shell
# syncoid --debug [email protected]:rpool/before syncoid-test-11/after
… [TRUNCATED] …
WARNING: snapshot listing failed, trying fallback command
DEBUG: getting list of snapshots on rpool/before using ssh      -S   /tmp/syncoid-root19216812226-1718239740-763496-6973 [email protected]  zfs get -Hpd 1 all ''"'"'rpool/before'"'"'' |...
CRITICAL ERROR: Unexpected dataset format in rpool/before       type    filesystem      - at /usr/sbin/syncoid line 1914.
```

Fixes:
jimsalterjrs#818 (comment)
Return to using `zfs get guid,creation,createtxg` to avoid the
performance overhead of `zfs get all`.

There is a per-host global cache to remember:
* if `zfs get` supports the type filter argument, `-t snapshot` and
* which ZFS properties are available.

The feature check introduced by subroutine `check_zfs_get_features`
eliminates the need for a `getsnaps` fallback recursion, as there is no
longer any guesswork into what features `zfs get` supports.

Fixes:
jimsalterjrs#818 (comment)
@Deltik Deltik changed the title syncoid: Fix multiple regressions caused by https://github.com/jimsalterjrs/sanoid/pull/818 syncoid: Fix multiple regressions caused by #818 Jun 13, 2024
@Deltik Deltik force-pushed the hotfix/regression/getsnaps branch from 8a023d6 to 191ddfe Compare June 15, 2024 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant