Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipath aborts discovery of SAS devices lacking SCSI_VENDOR field #56

Closed
malventano opened this issue Nov 17, 2022 · 15 comments · Fixed by #64
Closed

Multipath aborts discovery of SAS devices lacking SCSI_VENDOR field #56

malventano opened this issue Nov 17, 2022 · 15 comments · Fixed by #64

Comments

@malventano
Copy link

Some SAS devices (in this case Seagate factory recertified 'white label' drives) may come with the Vendor field blank. This causes Multipath to fail to complete the discovery of those devices. Manually adding associated wwid's to multipath.conf has no effect.

multipath -v9 showing a pair of the suspect SAS devices (note that the process ends after the size check, where normally the next entry is vendor)

Nov 12 23:36:22 | Discover device /sys/devices/pci0000:d7/0000:d7:02.0/0000:da:00.0/host19/port-19:0/expander-19:0/port-19:0:1/expander-19:1/port-19:1:0/end_device-19:1:0/target19:0:1/19:0:1:0/block/sdjp
Nov 12 23:36:22 | 65:304: dev_t not found in pathvec
Nov 12 23:36:22 | sdjp: udev property ID_WWN whitelisted
Nov 12 23:36:22 | sdjp: mask = 0x3f
Nov 12 23:36:22 | sdjp: dev_t = 65:304
Nov 12 23:36:22 | open '/sys/devices/pci0000:d7/0000:d7:02.0/0000:da:00.0/host19/port-19:0/expander-19:0/port-19:0:1/expander-19:1/port-19:1:0/end_device-19:1:0/target19:0:1/19:0:1:0/block/sdjp/size'
Nov 12 23:36:22 | sdjp: size = 31251759104
Nov 12 23:36:22 | Discover device /sys/devices/pci0000:d7/0000:d7:02.0/0000:da:00.0/host19/port-19:0/expander-19:0/port-19:0:1/expander-19:1/port-19:1:1/end_device-19:1:1/target19:0:2/19:0:2:0/block/sdjq
Nov 12 23:36:22 | 65:320: dev_t not found in pathvec
Nov 12 23:36:22 | sdjq: udev property ID_WWN whitelisted
Nov 12 23:36:22 | sdjq: mask = 0x3f
Nov 12 23:36:22 | sdjq: dev_t = 65:320
Nov 12 23:36:22 | open '/sys/devices/pci0000:d7/0000:d7:02.0/0000:da:00.0/host19/port-19:0/expander-19:0/port-19:0:1/expander-19:1/port-19:1:1/end_device-19:1:1/target19:0:2/19:0:2:0/block/sdjq/size'
Nov 12 23:36:22 | sdjq: size = 31251759104

udevadm info output appears normal vs. other SAS devices except the following two fields are missing:

SCSI_VENDOR
SCSI_VENDOR_ENC

smartctl -a -d scsi output from one of the drives:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.64-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, [www.smartmontools.org](http://www.smartmontools.org/)

=== START OF INFORMATION SECTION ===
Vendor:
Product:              OOS16000G
Revision:             OOS1
Compliance:           SPC-5
User Capacity:        16,000,900,661,248 bytes [16.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500d776128b
Serial number:        00000PJM0000C1507RXB
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Nov 13 14:04:50 2022 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
@mwilck
Copy link
Contributor

mwilck commented Nov 17, 2022

Quoting from the SPC-6, §6.7.2: "The T10 VENDOR IDENTIFICATION field contains eight bytes of left-aligned ASCII data (see 4.3.1) identifying the manufacturer of the logical unit. The T10 vendor identification shall be one assigned by INCITS. A list of assigned T10 vendor identifications is in Annex G and on the T10 web site (http://www.t10.org)."

I daresay this device fails to comply to the most basic SCSI standard (assuming that a sequence of 8 spaces is not an officially assigned vendor id). That's not to say we shouldn't fix it, but it's really a weird corner case.

@malventano
Copy link
Author

Totally understood that it is a weird case, but Seagate appears to be doing this (by their own internal policy) for their white-label recertified drives. I've fed back via the reseller that their choice breaks basic functionality that relies on the Vendor field, but who knows if / when that feedback will be addressed.

...but now that drives with no Vendor ID appear to exist in the wild, it has highlighted Multipath's inability to handle such devices.

@mwilck
Copy link
Contributor

mwilck commented Nov 30, 2022

Can you please try with this patch?

0001-libmultipath-pathinfo-don-t-fail-for-devices-lacking.patch

@malventano
Copy link
Author

I returned the issue drives for other models, but I know a few others who have some (they are not currently using multipath). Will spread the word and/or try and pick up a single for testing.

@mwilck
Copy link
Contributor

mwilck commented Dec 19, 2022

@malventano, I fail to parse your response. Does this mean the patch worked for you, or that you couldn't test it because you don't use multipath?

@malventano
Copy link
Author

malventano commented Dec 19, 2022

I returned the drives that were not reporting vendor ID (swapped them for a different model), but I am working with someone else who has some and can test. Worst case I'll get my hands on one to retest. More to follow :)

@malventano
Copy link
Author

malventano commented Jan 4, 2023

Update with the patch:

[root@archlinux ~]# multipath -ll
53.526500 | /etc/multipath.conf line 24, "bindings_file" is deprecated and will be disabled in a future release
53.526543 | /etc/multipath.conf line 25, "wwids_file" is deprecated and will be disabled in a future release
53.526550 | /etc/multipath.conf line 26, "prkeys_file" is deprecated and will be disabled in a future release
53.533269 | sdb: broken device without vendor ID
53.537700 | sdd: broken device without vendor ID
53.546100 | sdb: broken device without vendor ID
53.546944 | sdd: broken device without vendor ID
mpatha (35000c500cb25a62b) dm-1 UNKNOWN,OOS18000G
size=16T features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 2:0:0:0 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 2:0:3:0 sdd 8:48 active ready running

...so the patch allowed the addition of drives lacking vendor ID (did note the 'broken device' warnings are appearing 2x though).
Unsure if we did something wrong with the build to trigger the warnings for bindings_file / wwids_file / prkeys_file, or are those actually being depreciated?

@bmarzins
Copy link
Contributor

Those files really are deprecated. So if they are in your config file, then this is expected.

@malventano
Copy link
Author

Those files really are deprecated. So if they are in your config file, then this is expected.

Are these files not the primary way that multipath tracks added devices? If they are depreciated, why are they still a part of the default config? What new method is being used to save/track previously added devices?

@mwilck
Copy link
Contributor

mwilck commented Jan 13, 2023

Are these files not the primary way that multipath tracks added devices?

Yes. What's deprecated is configuring the path to these files in multipath.conf. The paths are set at compile time now.

If they are depreciated, why are they still a part of the default config?

Not sure what you mean. multipath has long stopped shipping with a "default config". More often than not, you're well off with an empty config file. The manual page states that these settings are deprecated.

I can see that, if multipath conf is generated with multipath -t, these options would be included, causing error messages later on. This is something we should fix eventually.

...so the patch allowed the addition of drives lacking vendor ID (did note the 'broken device' warnings are appearing 2x though).

That's how it is. We're not going to implement a per-device warning counter just for these broken devices.

@mwilck
Copy link
Contributor

mwilck commented Jan 13, 2023

Correction:

I can see that, if multipath conf is generated with multipath -t, these options would be included

If you read the message it says that they "will be disabled in a future release". When that happens, we will also stop printing them with "multipath -t".

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Jan 13, 2023
…ties

Some SAS devices (e.g. Seagate factory recertified 'white label' drives) may
come with the Vendor field blank. This causes Multipath to fail to
complete the discovery of those devices.

Such devices violate the SCSI Spec. From the SPC-6, §6.7.2:
"The T10 VENDOR IDENTIFICATION field contains eight bytes of left-aligned
ASCII data (see 4.3.1) identifying the manufacturer of the logical unit. The
T10 vendor identification shall be one assigned by INCITS.".

But as we don't identify WWIDs by vendor and product, we don't need to discard
these devices right away. We can go ahead fingers crossed, and hope that the
the other VPD pages for the device are correct.

We obviously can't look up reasonable device properties for such devices in
our hwtable. It would be up to the user to deal with that.

Reported by: Allyn Malventano (github.com/opensvc/issues/56)
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Jan 13, 2023

I posted the patch above to dm-devel today, "libmultipath: pathinfo: don't fail for devices lacking INQUIRY properties"

@malventano
Copy link
Author

malventano commented Jan 15, 2023

Not sure what you mean.

I was reading those warning messages:

53.526500 | /etc/multipath.conf line 24, "bindings_file" is deprecated and will be disabled in a future release
53.526543 | /etc/multipath.conf line 25, "wwids_file" is deprecated and will be disabled in a future release
53.526550 | /etc/multipath.conf line 26, "prkeys_file" is deprecated and will be disabled in a future release

...to mean that the files themselves were depreciated. If I'm reading your response correctly, it is the path configurability that has been depreciated, which I have no argument against :).

More often than not, you're well off with an empty config file.

Totally agree that an empty config is ideal save a few customization entries as applicable, but the standard practice I've seen nearly everywhere is to generate a 'default' config with multipath -t and go from there. The man suggests -T for generating a new multipath.conf - does -T currently omit those depreciated variables while -t leaves them in?

When that happens, we will also stop printing them with "multipath -t".

It may be beneficial to omit those values from -t sooner rather than later so new multipath users are not tempted to adopt a depreciated feature.

We're not going to implement a per-device warning counter just for these broken devices.

My point was that the warning is appearing twice per path in rapid succession (so 4x for a single device in this case). Is it expected to process each path 2x for a single multipath -ll command issued?

@mwilck
Copy link
Contributor

mwilck commented Jan 16, 2023

The man suggests -T for generating a new multipath.conf - does -T currently omit those depreciated variables while -t leaves them in?

No. There's no difference between '-t and -T in this respect. They differ only in the way they print device properties. Thinking about it, the correct way to "solve" this issue will be to deprecate these options for good.

My point was that the warning is appearing twice per path in rapid succession (so 4x for a single device in this case). Is it expected to process each path 2x for a single multipath -ll command issued?

Sort of. The pathinfo() function is called very frequently in the multipath code, to retrieve various aspects of a device. And the vendor is considered a very basic property that's always fetched, which causes the message every time. If you want to create a patch for this issue, go ahead. It remains a corner case from my perspective.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Jan 24, 2023
…ties

Some SAS devices (e.g. Seagate factory recertified 'white label' drives) may
come with the Vendor field blank. This causes Multipath to fail to
complete the discovery of those devices.

Such devices violate the SCSI Spec. From the SPC-6, §6.7.2:
"The T10 VENDOR IDENTIFICATION field contains eight bytes of left-aligned
ASCII data (see 4.3.1) identifying the manufacturer of the logical unit. The
T10 vendor identification shall be one assigned by INCITS.".

But as we don't identify WWIDs by vendor and product, we don't need to discard
these devices right away. We can go ahead fingers crossed, and hope that the
the other VPD pages for the device are correct.

We obviously can't look up reasonable device properties for such devices in
our hwtable. It would be up to the user to deal with that.

Reported by: Allyn Malventano (github.com/opensvc/issues/56)
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Jan 24, 2023

This should be fixed by 88d46ea, on https://github.com/openSUSE/multipath-tools/tree/queue

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Mar 14, 2023
…ties

Some SAS devices (e.g. Seagate factory recertified 'white label' drives) may
come with the Vendor field blank. This causes Multipath to fail to
complete the discovery of those devices.

Such devices violate the SCSI Spec. From the SPC-6, §6.7.2:
"The T10 VENDOR IDENTIFICATION field contains eight bytes of left-aligned
ASCII data (see 4.3.1) identifying the manufacturer of the logical unit. The
T10 vendor identification shall be one assigned by INCITS.".

But as we don't identify WWIDs by vendor and product, we don't need to discard
these devices right away. We can go ahead fingers crossed, and hope that the
the other VPD pages for the device are correct.

We obviously can't look up reasonable device properties for such devices in
our hwtable. It would be up to the user to deal with that.

Reported by: Allyn Malventano (github.com/opensvc/issues/56)
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants