Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We see this error: (da4:mps0:0:3:0): SCSI sense: HARDWARE FAILURE asc:3e,3 (Logical unit failed self-test) for drives that have failed. Our vendor tells us there's no recovery from that state, though we can still grab logs from the drives and run their diagnostics. Drives in this state need to bascially be remanufactured because some part of them has failed. The prior default behavior is to retry, and retrying takes a long time to work out. Instead, short-circuit the retries and fail right away. I selected ENXIO because no I/O to LBAs is possible for drives in this state (both my experience and per vendor). Some googling suggests that other vendors behave identically, but it was inconclusive. Should this be too pessimistic, we can adjust in the future. Also, this is with some aging drives in our fleet, and if we have more than one drive in this state, our systems take so long to get to mountroot that the watchdog fires sometimes. Adding this patch makes them boot reliably again. MFC After: 1 week Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D48505 (cherry picked from commit a8b49e7)
- Loading branch information