You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When importing a zpool following a crash under heavy write/mutation load, zpool import latencies can be high - running to the tens of minutes or even hours in extreme cases. One of the causes of this can be the following: zil_replay serially replays journaled writes in the zfs intent log (zil) into the pool before the pool can be brought online. Serial zil_replay latency scales linearly with the number of entries in the zil, but some operations are more costly to replay than others. In particular TX_WRITE and TX_LINK zil entries are more costly because they can trigger read-modify-write behavior in replay, adding IOs to the pooled storage on the replay path, not just the need to read the zil itself. The cost of this is exacerbated when high latency devices are used to as vdevs in the pool, for example HDDs or S3. The goal of the feature is to reduce the tail latencies zpool import by reducing the latency of zil_replay in these cases.
I will open a candidate pull request with a way to reduce zil_replay times - the candidate solution does a first pass through the zil and issues arc_read requests to prime the arc for all the zil entries that can trigger slow read-modify-write cycles during replay. After the arc-priming run, zil_replay is called and completes much more quickly because the read-modify-write cycles do not trigger serial IOs to pooled storage, but rather become ARC hits. In testing extreme cases of vary large ZILs containing hardlink and small recordsize-unaligned write IOs, this priming reduces zil_replay times by more than 20x, from hours to single-digit minutes.
How will this feature improve OpenZFS?
Reducing pool import times improves OpenZFS by reducing downtime following file server crashes when the zil must be replayed to bring a pool back online.
Additional context
I'm not sure if the candidate solution is the optimal solution, but I do think the goal of reducing zpool import times is really important for high-availability use cases for OpenZFS.
The text was updated successfully, but these errors were encountered:
Summary
When importing a zpool following a crash under heavy write/mutation load, zpool import latencies can be high - running to the tens of minutes or even hours in extreme cases. One of the causes of this can be the following:
zil_replay
serially replays journaled writes in the zfs intent log (zil) into the pool before the pool can be brought online. Serial zil_replay latency scales linearly with the number of entries in the zil, but some operations are more costly to replay than others. In particular TX_WRITE and TX_LINK zil entries are more costly because they can trigger read-modify-write behavior in replay, adding IOs to the pooled storage on the replay path, not just the need to read the zil itself. The cost of this is exacerbated when high latency devices are used to as vdevs in the pool, for example HDDs or S3. The goal of the feature is to reduce the tail latencies zpool import by reducing the latency of zil_replay in these cases.I will open a candidate pull request with a way to reduce zil_replay times - the candidate solution does a first pass through the zil and issues arc_read requests to prime the arc for all the zil entries that can trigger slow read-modify-write cycles during replay. After the arc-priming run, zil_replay is called and completes much more quickly because the read-modify-write cycles do not trigger serial IOs to pooled storage, but rather become ARC hits. In testing extreme cases of vary large ZILs containing hardlink and small recordsize-unaligned write IOs, this priming reduces zil_replay times by more than 20x, from hours to single-digit minutes.
How will this feature improve OpenZFS?
Reducing pool import times improves OpenZFS by reducing downtime following file server crashes when the zil must be replayed to bring a pool back online.
Additional context
I'm not sure if the candidate solution is the optimal solution, but I do think the goal of reducing zpool import times is really important for high-availability use cases for OpenZFS.
The text was updated successfully, but these errors were encountered: