cam: Add 3e/3 as a fatal code

We see this error: (da4:mps0:0:3:0): SCSI sense: HARDWARE FAILURE asc:3e,3 (Logical unit failed self-test) for drives that have failed. Our vendor tells us there's no recovery from that state, though we can still grab logs from the drives and run their diagnostics. Drives in this state need to bascially be remanufactured because some part of them has failed. The prior default behavior is to retry, and retrying takes a long time to work out. Instead, short-circuit the retries and fail right away. I selected ENXIO because no I/O to LBAs is possible for drives in this state (both my experience and per vendor). Some googling suggests that other vendors behave identically, but it was inconclusive. Should this be too pessimistic, we can adjust in the future. Also, this is with some aging drives in our fleet, and if we have more than one drive in this state, our systems take so long to get to mountroot that the watchdog fires sometimes. Adding this patch makes them boot reliably again. MFC After: 1 week Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D48505 (cherry picked from commit a8b49e7)
freebsd · Jan 24, 2025 · 98983f3 · 98983f3
1 parent 13639d5
commit 98983f3
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/sys/cam/scsi/scsi_all.c b/sys/cam/scsi/scsi_all.c
@@ -2249,7 +2249,7 @@ static struct asc_table_entry asc_table[] = {
 	{ SST(0x3E, 0x02, SS_RDEF,
 	    "Timeout on logical unit") },
 	/* DTLPWROMAEBKVF */
-	{ SST(0x3E, 0x03, SS_RDEF,	/* XXX TBD */
+	{ SST(0x3E, 0x03, SS_FATAL | ENXIO,
 	    "Logical unit failed self-test") },
 	/* DTLPWROMAEBKVF */
 	{ SST(0x3E, 0x04, SS_RDEF,	/* XXX TBD */