Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce zil replay time to improve zpool import latencies #17043

Open
markroper opened this issue Feb 11, 2025 · 0 comments
Open

Reduce zil replay time to improve zpool import latencies #17043

markroper opened this issue Feb 11, 2025 · 0 comments
Labels
Type: Feature Feature request or new feature

Comments

@markroper
Copy link
Contributor

Summary

When importing a zpool following a crash under heavy write/mutation load, zpool import latencies can be high - running to the tens of minutes or even hours in extreme cases. One of the causes of this can be the following: zil_replay serially replays journaled writes in the zfs intent log (zil) into the pool before the pool can be brought online. Serial zil_replay latency scales linearly with the number of entries in the zil, but some operations are more costly to replay than others. In particular TX_WRITE and TX_LINK zil entries are more costly because they can trigger read-modify-write behavior in replay, adding IOs to the pooled storage on the replay path, not just the need to read the zil itself. The cost of this is exacerbated when high latency devices are used to as vdevs in the pool, for example HDDs or S3. The goal of the feature is to reduce the tail latencies zpool import by reducing the latency of zil_replay in these cases.

I will open a candidate pull request with a way to reduce zil_replay times - the candidate solution does a first pass through the zil and issues arc_read requests to prime the arc for all the zil entries that can trigger slow read-modify-write cycles during replay. After the arc-priming run, zil_replay is called and completes much more quickly because the read-modify-write cycles do not trigger serial IOs to pooled storage, but rather become ARC hits. In testing extreme cases of vary large ZILs containing hardlink and small recordsize-unaligned write IOs, this priming reduces zil_replay times by more than 20x, from hours to single-digit minutes.

How will this feature improve OpenZFS?

Reducing pool import times improves OpenZFS by reducing downtime following file server crashes when the zil must be replayed to bring a pool back online.

Additional context

I'm not sure if the candidate solution is the optimal solution, but I do think the goal of reducing zpool import times is really important for high-availability use cases for OpenZFS.

@markroper markroper added the Type: Feature Feature request or new feature label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

1 participant