Skip to content

Commit

Permalink
cam: Add 3e/3 as a fatal code
Browse files Browse the repository at this point in the history
We see this error:

(da4:mps0:0:3:0): SCSI sense: HARDWARE FAILURE asc:3e,3 (Logical unit failed self-test)

for drives that have failed. Our vendor tells us there's no recovery
from that state, though we can still grab logs from the drives and run
their diagnostics. Drives in this state need to bascially be
remanufactured because some part of them has failed. The prior default
behavior is to retry, and retrying takes a long time to work
out. Instead, short-circuit the retries and fail right away. I selected
ENXIO because no I/O to LBAs is possible for drives in this state (both
my experience and per vendor). Some googling suggests that other vendors
behave identically, but it was inconclusive. Should this be too
pessimistic, we can adjust in the future. Also, this is with some aging
drives in our fleet, and if we have more than one drive in this state,
our systems take so long to get to mountroot that the watchdog fires
sometimes. Adding this patch makes them boot reliably again.

MFC After:		1 week
Sponsored by:		Netflix
Reviewed by:		mav
Differential Revision:	https://reviews.freebsd.org/D48505

(cherry picked from commit a8b49e7)
  • Loading branch information
bsdimp committed Jan 24, 2025
1 parent 13639d5 commit 98983f3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion sys/cam/scsi/scsi_all.c
Original file line number Diff line number Diff line change
Expand Up @@ -2249,7 +2249,7 @@ static struct asc_table_entry asc_table[] = {
{ SST(0x3E, 0x02, SS_RDEF,
"Timeout on logical unit") },
/* DTLPWROMAEBKVF */
{ SST(0x3E, 0x03, SS_RDEF, /* XXX TBD */
{ SST(0x3E, 0x03, SS_FATAL | ENXIO,
"Logical unit failed self-test") },
/* DTLPWROMAEBKVF */
{ SST(0x3E, 0x04, SS_RDEF, /* XXX TBD */
Expand Down

0 comments on commit 98983f3

Please sign in to comment.