Add early rejection of IOs if too many Downstairs are inactive #1565

mkeeter · 2024-11-19T16:53:43Z

This aborts the IO before passing it to the Downstairs, so it's not assigned a JobId or put into the ActiveJobs map. The most noticeable change is that writes are now fast-err'd instead of fast-acked if > 1 Downstairs is inactive.

leftwo · 2024-11-19T18:49:03Z

Some random musings around this.

If we send the write to one downstairs, it could land on disk, but the IO will sit waiting for another downstairs to write it before we could ACK back to the guest.

Any flush is going to stop all future IOs, as they will all build up behind it. This is actually a reason to fast fail writes/flushes as I believe that would enable reads to still work, right?

What should we do with in-flight IOs? Do we now want to discard them as well? If we do fail outstanding IOs, outstanding writes we would have already fast-acked, but flushes will now go to failed.

By doing this, are we cutting off a chance for a replay to catch things up?

mkeeter · 2024-11-20T15:44:27Z

If we send the write to one downstairs, it could land on disk, but the IO will sit waiting for another downstairs to write it before we could ACK back to the guest.

No, writes are either fast-acked to the guest or (after changes made in this PR) fast-errored. In the latter case, they aren't sent to any Downstairs.

Any flush is going to stop all future IOs, as they will all build up behind it. This is actually a reason to fast fail writes/flushes as I believe that would enable reads to still work, right?

Yes, with caveats – if the Guest is waiting for a flush, then sending a read, it would previously stall if 2x Downstairs are offline. Now, the flush will return an error immediately, and the read can proceed (or not, depending on how the Guest handles error codes).

What should we do with in-flight IOs? Do we now want to discard them as well? If we do fail outstanding IOs, outstanding writes we would have already fast-acked, but flushes will now go to failed.

This PR hasn't changed any behavior here. If a Downstairs goes to the Faulted state, all of its IOs are skipped, which may cause them to be acked to the Guest.

By doing this, are we cutting off a chance for a replay to catch things up?

No, any IO rejected here is never added to the active jobs list, so it's never sent to any downstairs. It's rejected before being assigned a JobId.

leftwo · 2024-11-21T14:07:07Z

If we send the write to one downstairs, it could land on disk, but the IO will sit waiting for another downstairs to write it before we could ACK back to the guest.

No, writes are either fast-acked to the guest or (after changes made in this PR) fast-errored. In the latter case, they aren't sent to any Downstairs.

Ah yes, fast ack will ack while it thinks things are good. If one downstairs is online, it could get the data and the other two would not, but we have already acked it back to the guest.

Any flush is going to stop all future IOs, as they will all build up behind it. This is actually a reason to fast fail writes/flushes as I believe that would enable reads to still work, right?

Yes, with caveats – if the Guest is waiting for a flush, then sending a read, it would previously stall if 2x Downstairs are offline. Now, the flush will return an error immediately, and the read can proceed (or not, depending on how the Guest handles error codes).

What should we do with in-flight IOs? Do we now want to discard them as well? If we do fail outstanding IOs, outstanding writes we would have already fast-acked, but flushes will now go to failed.

This PR hasn't changed any behavior here. If a Downstairs goes to the Faulted state, all of its IOs are skipped, which may cause them to be acked to the Guest.

The transition to faulted here, that's what I think I needed to see, so that will skip IO to any downstairs in Faulted state.

By doing this, are we cutting off a chance for a replay to catch things up?

No, any IO rejected here is never added to the active jobs list, so it's never sent to any downstairs. It's rejected before being assigned a JobId.

So, your right that a replay does not happen here. I'm thinking about Offline state, vs. Faulted. Once we have Faulted, then it's only LiveRepair to bring a downstairs back.

leftwo · 2024-11-21T14:11:16Z

upstairs/src/downstairs.rs

+        self.clients
+            .iter()
+            .filter(|c| {
+                matches!(c.state(), DsState::Active | DsState::LiveRepair)


I think we need to allow additional states here. Offliine for instance is one where we have not yet decided what the situation is, and it could just be a downstairs rebooting and soon enough (fingers crossed) it will come back.

There may be other transitory states to consider as well.

Good point, I'll revisit this. In parallel, I've been working on simplifying the state machine (#1568 and #1570), so it might make sense to bring them in first.

If this goes in before 1568/1570, then we want the updated list of DsState's

It's going to be much easier to rebase this onto #1577 then vice versa, so I think we should get those three merged before thinking about this further!

At long last, #1577 is merged and this is rebased. I've moved this match into a standalone function (DownstairsClient::is_accepting_io) and added DsState::Connecting { mode: ConnectionMode::Offline, .. } as a valid case.

faithanalog

I believe this change does what it intends to do, and I am pretty sure it's the right thing to do broadly, but I'm deferring to alan's comments on the details

leftwo

Will this block internal flushes for read only upstairs?

leftwo · 2025-01-13T17:44:48Z

upstairs/src/dummy_downstairs_tests.rs

+async fn read_with_one_fault() {
+    let mut harness = TestHarness::new().await;
+
+    // Use a write to fault DS0 (XXX why do read errors not fault a DS?)


With a write failure, we don't know the state (from an upstairs POV) of what actually made it on to disk. As such, we can't know that this downstairs is the same as the other two downstairs. So we have to fault the downstairs with the failing IO.

For a read error, we know there is an error, but we don't know that this downstairs is different than the other downstairs from the upstairs point of view. At least not without more information. If we get a read error, that is "good" in that we are not returning bad data to the guest.

mkeeter · 2025-01-13T19:19:53Z

Will this block internal flushes for read only upstairs?

@leftwo I don't think so – internal flushes call Upstairs::submit_flush directly, which doesn't check for active clients (flushes initiated by the guest come in through BlockOp::Flush, which does check).

mkeeter requested review from faithanalog, jmpesp and leftwo November 19, 2024 16:53

leftwo reviewed Nov 21, 2024

View reviewed changes

faithanalog reviewed Nov 22, 2024

View reviewed changes

mkeeter force-pushed the mkeeter/early-io-rejection branch from 775749f to 62229ec Compare January 13, 2025 16:13

leftwo reviewed Jan 13, 2025

View reviewed changes

mkeeter force-pushed the mkeeter/early-io-rejection branch 3 times, most recently from 33c34e9 to dfc7697 Compare January 14, 2025 17:09

leftwo approved these changes Jan 14, 2025

View reviewed changes

mkeeter force-pushed the mkeeter/early-io-rejection branch from dfc7697 to 745927e Compare January 16, 2025 17:09

mkeeter added 4 commits January 23, 2025 13:31

Early rejection of IOs with insufficient live Downstairs

a6209b1

Unit tests for early rejection

5004b2b

Accept IO if negotiating from Offline

a12b3eb

Fix unit tests

dd1cf13

mkeeter force-pushed the mkeeter/early-io-rejection branch from 745927e to dd1cf13 Compare January 23, 2025 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add early rejection of IOs if too many Downstairs are inactive #1565

Add early rejection of IOs if too many Downstairs are inactive #1565

mkeeter commented Nov 19, 2024

leftwo commented Nov 19, 2024

mkeeter commented Nov 20, 2024

leftwo commented Nov 21, 2024

leftwo Nov 21, 2024

mkeeter Nov 22, 2024 •

edited

Loading

leftwo Dec 9, 2024

mkeeter Dec 9, 2024

mkeeter Jan 13, 2025

faithanalog left a comment

leftwo left a comment

leftwo Jan 13, 2025

mkeeter commented Jan 13, 2025

Add early rejection of IOs if too many Downstairs are inactive #1565

Are you sure you want to change the base?

Add early rejection of IOs if too many Downstairs are inactive #1565

Conversation

mkeeter commented Nov 19, 2024

leftwo commented Nov 19, 2024

mkeeter commented Nov 20, 2024

leftwo commented Nov 21, 2024

leftwo Nov 21, 2024

Choose a reason for hiding this comment

mkeeter Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

leftwo Dec 9, 2024

Choose a reason for hiding this comment

mkeeter Dec 9, 2024

Choose a reason for hiding this comment

mkeeter Jan 13, 2025

Choose a reason for hiding this comment

faithanalog left a comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment

leftwo Jan 13, 2025

Choose a reason for hiding this comment

mkeeter commented Jan 13, 2025

mkeeter Nov 22, 2024 •

edited

Loading