Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: can't get multipath to use multiple paths with ISCSI read loads #106

Open
TMLKyza opened this issue Nov 24, 2024 · 13 comments
Open

Comments

@TMLKyza
Copy link

TMLKyza commented Nov 24, 2024

Hi, first of all thanks for making this project.
I'm a total noob when it comes to ISCSI, but it's 2 weeks that I'm struggling with setting up MPIO and I cant really find a lot online.

Let's start by saying that I have 4 portals on my target (one for each physical NIC) each with 1Gbit of bandwidth. On my initiator instead I run 1 NIC with 1Gbit of bandwidth. I wanted to balance the load on the target's 4 interfaces (as link aggregation does).

So I've set up a dataset on my ZFS box with 2 fileio backstores and I'm running targetcli with this configuration:

o- / ............................................................................................. [...]
 o- backstores .................................................................................. [...]
 | o- block ...................................................................... [Storage Objects: 0]
 | o- fileio ..................................................................... [Storage Objects: 2]
 | | o- nalixsa ......................... [/tank/iscsi-zvols/nalixsa.img (1.0TiB) write-back activated]
 | | | o- alua ....................................................................... [ALUA Groups: 1]
 | | |   o- default_tg_pt_gp ........................................... [ALUA state: Active/optimized]
 | | o- tmlkyza ......................... [/tank/iscsi-zvols/tmlkyza.img (1.0TiB) write-back activated]
 | |   o- alua ....................................................................... [ALUA Groups: 1]
 | |     o- default_tg_pt_gp ........................................... [ALUA state: Active/optimized]
 | o- pscsi ...................................................................... [Storage Objects: 0]
 | o- ramdisk .................................................................... [Storage Objects: 0]
 o- iscsi ................................................................................ [Targets: 2]
 | o- iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab ................................ [TPGs: 1]
 | | o- tpg1 ................................................................... [no-gen-acls, no-auth]
 | |   o- acls .............................................................................. [ACLs: 2]
 | |   | o- iqn.2016-04.com.open-iscsi:4e1baa4eee8d .................................. [Mapped LUNs: 1]
 | |   | | o- mapped_lun0 .................................................. [lun0 fileio/tmlkyza (rw)]
 | |   | o- iqn.2016-04.com.open-iscsi:9dc78830babc .................................. [Mapped LUNs: 1]
 | |   |   o- mapped_lun0 .................................................. [lun0 fileio/tmlkyza (rw)]
 | |   o- luns .............................................................................. [LUNs: 1]
 | |   | o- lun0 .................. [fileio/tmlkyza (/tank/iscsi-zvols/tmlkyza.img) (default_tg_pt_gp)]
 | |   o- portals ........................................................................ [Portals: 4]
 | |     o- 192.168.0.201:3260 ................................................................... [OK]
 | |     o- 192.168.0.202:3260 ................................................................... [OK]
 | |     o- 192.168.0.203:3260 ................................................................... [OK]
 | |     o- 192.168.0.204:3260 ................................................................... [OK]
 | o- iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.105dfa14ea77 ................................ [TPGs: 1]
 |   o- tpg1 ................................................................... [no-gen-acls, no-auth]
 |     o- acls .............................................................................. [ACLs: 2]
 |     | o- iqn.1991-05.com.microsoft:desktop-abuf72c ................................ [Mapped LUNs: 1]
 |     | | o- mapped_lun1 .................................................. [lun1 fileio/nalixsa (rw)]
 |     | o- iqn.2016-04.com.open-iscsi:4e1baa4eee8d .................................. [Mapped LUNs: 1]
 |     |   o- mapped_lun1 .................................................. [lun1 fileio/nalixsa (rw)]
 |     o- luns .............................................................................. [LUNs: 1]
 |     | o- lun1 .................. [fileio/nalixsa (/tank/iscsi-zvols/nalixsa.img) (default_tg_pt_gp)]
 |     o- portals ........................................................................ [Portals: 4]
 |       o- 192.168.0.201:3260 ................................................................... [OK]
 |       o- 192.168.0.202:3260 ................................................................... [OK]
 |       o- 192.168.0.203:3260 ................................................................... [OK]
 |       o- 192.168.0.204:3260 ................................................................... [OK]
 o- loopback ............................................................................. [Targets: 0]
 o- vhost ................................................................................ [Targets: 0]
 o- xen-pvscsi ........................................................................... [Targets: 0]

Now I installed multipath-tools on my machine and attached all the portals:

❯ sudo iscsiadm -m node
192.168.0.201:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab
192.168.0.202:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab
192.168.0.203:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab
192.168.0.204:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab
192.168.0.204:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.105dfa14ea77

I've also setup multipath and I can see all the 4 paths:

❯ sudo multipath -ll
mpath0 (360014055292766259404b3599f2bd3a8) dm-0 LIO-ORG,tmlkyza
size=1.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 10:0:0:0 sdd 8:48 active ready running
  |- 11:0:0:0 sdc 8:32 active ready running
  |- 8:0:0:0  sda 8:0  active ready running
  `- 9:0:0:0  sdb 8:16 active ready running

for sake of completeness this is my /etc/multipath.conf:

blacklist_exceptions {
	wwid 360014055292766259404b3599f2bd3a8
}

blacklist {
	wwid .*
}

multipaths {
	multipath {
		wwid "360014055292766259404b3599f2bd3a8"
		alias "mpath0"
	}
}

defaults {
	find_multipaths	"strict"
	uid_attribute	ID_SERIAL
	path_checker	"tur"
	user_friendly_names "yes"
	path_grouping_policy "multibus"
	path_selector "round-robin 0"
	#path_selector "queue-lenght 0"
	failback immediate
	prio "alua"
	#prio "const"
	no_path_retry 0
	rr_weight "uniform"
	features "1 queue_if_no_path"
	rr_min_io_rq 10
}

Now, tests are performed using:
sudo fio --filename=iscsiLib/test --direct=1 --rw=randwrite --bs=1m --size=1G --numjobs=10 --group_reporting --name=file1
The strange thing is that when running random reads I get 980Mbps only on one NIC (and always the same one).
Instead, when I perform random writes I get the expected behaviour i.e. 250Mbps from each NIC.

I don't really understand why it's doing this, I want to balance the load as I bought 2.5Gbit network equipment and I want to use MPIO to squeeze out all the bandwidth possible.

Is it a misconfiguration ? or a conceptual error on my side?

Many thanks for the time!

@bmarzins
Copy link
Contributor

If you run your fio test directly on top of the multipath device, do you still see the IO all going to one device?

@xosevp
Copy link
Contributor

xosevp commented Nov 27, 2024

That device config is insane. Defaults for this device ( multipath -t ) should works flawlessly:

devices {
        device {
                vendor "(LIO-ORG|SUSE)"
                product ".*"
                path_grouping_policy "group_by_prio"
                path_checker "directio"
                hardware_handler "1 alua"
                prio "alua"
                failback "immediate"
                no_path_retry 12
                detect_checker "no"
        }       
}

If you wish, this two can stay in the defaults section:

defaults {
	find_multipaths	"strict"
	user_friendly_names "yes"
}

@TMLKyza
Copy link
Author

TMLKyza commented Nov 27, 2024

If you run your fio test directly on top of the multipath device, do you still see the IO all going to one device?

Yes I run the fio command in a local directory just because I've mounted it in a directory under home

That device config is insane. Defaults ( multipath -t ) should works flawlessly:

I'll give it a shot. It's actually quite hard to find good guides on this topic online, proxmox has one and my config was (at first at least) based on it, but it doesn't look much different from what I run.
EDIT: ok I've run a random read and random write test, still the same issues, only one link reading and 4 (3 this time idk why) nics writing.
Screenshot From 2024-11-27 17-24-02

@mwilck
Copy link
Contributor

mwilck commented Nov 27, 2024

open-iSCSI will by default not set up interface binding for the default tcp transport. Therefore the kernel sends the packets according to its routing table. If there are no dedicated routes for the different paths, it's well possible that the same local NIC is used all the time even though multipath is switching paths.

See §5.1.1 of the open-iscsi README for information on how to set up interface binding.

To view the path usage on the block device level, use a tool such as iostat.

@TMLKyza
Copy link
Author

TMLKyza commented Nov 28, 2024

I've read §5.1.1 and §5.1.3.
I've added an interface on my initiator as explained and bound it to the 4 portals by running:
sudo iscsiadm -m node --targetname iqn.2023-01.com.example:storage.target01 --portal 192.168.0.201 -I iface0 --login
with the 4 ips 192.168.0.20[1..4].

I've run the same tests but nothing I still get the same behaviour.

Balancing to block devices is done correctly (because that is dealt by ZFS as I'm using a fileio backstore).
What I'm complaining about is that on read tests the balancing on the targets NICs is not working (see image above).

@mwilck
Copy link
Contributor

mwilck commented Nov 28, 2024

Balancing to block devices is done correctly

you don't know unless you run iostat. zfs will balance over you block devices on top of multipath, but not over paths

I've added an interface on my initiator as explained and bound it to the 4 portals

I don't understand. You'll need to create two iSCSI interfaces, bound to your local NICs.

@xosevp
Copy link
Contributor

xosevp commented Nov 28, 2024

That device config is insane. Defaults ( multipath -t ) should works flawlessly:

I'll give it a shot. It's actually quite hard to find good guides on this topic online, proxmox has one and my config was (at first at least) based on it, but it doesn't look much different from what I run.

Configs with sane values are included by default in multipath-tools at hwtable.c, and specifically for your device (LIO-ORG ) at: https://github.com/opensvc/multipath-tools/blob/master/libmultipath/hwtable.c#L1062-L1076

On the Internet there are plenty of wrong/old configs, even on the vendor's web/docs.

Hints to monitor ethernet devices and block devices from the shell, both from sysstat package:
sar -n DEV 1
iostat -dmx 1

@TMLKyza
Copy link
Author

TMLKyza commented Nov 29, 2024

you don't know unless you run iostat. zfs will balance over you block devices on top of multipath, but not over paths

Sorry I'm a total moron and didn't understand what you were saying to me. I've checked now with iostat -dmx 1 and I can see all 4 ISCSI drives being utilized and dm-0 (mpath0) and dm-1 (mpath0-part1) are being pushed to 100% usage.

I don't understand. You'll need to create two iSCSI interfaces, bound to your local NICs.

I must confess that this is not that well clear to me. On my initiator I have only 1 NIC on my target I have 4 (one for each portal) so do I need to create an Iface for each portal ?
I've already tested it tho

❯ sudo iscsiadm -m session
tcp: [1] 192.168.0.204:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab (non-flash)
tcp: [2] 192.168.0.203:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab (non-flash)
tcp: [3] 192.168.0.202:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab (non-flash)
tcp: [4] 192.168.0.201:3260,1 iqn.2003-01.org.linux-iscsi.lucy.x8664:sn.06c3db61faab (non-flash)
❯ sudo iscsiadm -m iface
iface1 tcp,36:e4:f7:05:85:8f,<empty>,<empty>,<empty>
iface2 tcp,36:e4:f7:05:85:8f,<empty>,<empty>,<empty>
iface3 tcp,36:e4:f7:05:85:8f,<empty>,<empty>,<empty>
iface4 tcp,36:e4:f7:05:85:8f,<empty>,<empty>,<empty>
tcp.36:e4:f7:05:85:8f.ipv4.0 tcp,36:e4:f7:05:85:8f,192.168.0.53,default,<empty>
default tcp,<empty>,<empty>,<empty>,<empty>
iser iser,<empty>,<empty>,<empty>,<empty>
❯ sudo multipath -ll
mpath0 (360014055292766259404b3599f2bd3a8) dm-0 LIO-ORG,tmlkyza
size=1.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 8:0:0:0  sdc 8:32 active ready running
  |- 9:0:0:0  sdd 8:48 active ready running
  |- 10:0:0:0 sde 8:64 active ready running
  `- 11:0:0:0 sdf 8:80 active ready running

I've liked each interface with each portal (i.e. ifaceX to 192.168.0.20X) but I can still see the same issue, on read operations the network usage, target side, is not being balanced but only one interface is being used.

On the Internet there are plenty of wrong/old configs, even on the vendor's web/docs.

Good to know, a shame that this tool didn't get as much attention as it deserves to be fair (I'm talking as a homelabber).

Hints to monitor ethernet devices and block devices from the shell, both from sysstat package:

I really like them TBH thanks again for the heads up!

@mwilck
Copy link
Contributor

mwilck commented Nov 29, 2024

Sorry, I got this wrong. I thought you were measuring load distribution on the initiator.

I can see all 4 ISCSI drives being utilized

This means that the problem is actually not multipath but either on the iSCSI layer or (more likely) on the networking layer. I see that the 4 IP addresses of your target are all on the same subnet.

On read operations the network usage, target side, is not being balanced but only one interface is being used

"Read operations" means that the target chooses the interface on which it sends the data. So I think there's a misconfiguration on the target side. Your target IP addresses all appear to be in the same subnet. The target sends data to the IP address of your host, which is reachable through each interface. Unless you have configured the target to bind the different sockets to specific interfaces, it is free to use always the same interface.

There are different ways to work around this. I'd actually suggest that instead of 4 separate interfaces with different IP addresses, you use bonding or teaming on the target to establish "multipathing" on the IP level. As long as there is only one initiator, you may still observe that the target uses just one interface. But as soon as there are multiple initiators, you should see balancing taking place. The behavior also depends on the bonding mode and other parameters.

Another option would be to create VLANs on both the client and the target. On the target, you'd use a different VLAN ID for each NIC (say 10, 20, 30, 40), and use separate subnets for the different VLANs (say 192.168.10.0/24, 192.168.20.0/24, etc). On the client, you'd create all 4 VLANs on top of the client's single interface and add an IP address to each one.

    [initiator]                   [target]
eth0.10 -> 192.168.10.10 <-> 192.168.10.1 -> eth0.10
eth0.20 -> 192.168.20.10 <-> 192.168.20.1 -> eth1.20
eth0.30 -> 192.168.30.10 <-> 192.168.30.1 -> eth2.30
eth0.40 -> 192.168.40.10 <-> 192.168.40.1 -> eth3.40

Next you'd set up 4 iSCSI interfaces on the client for each of your VLAN interfaces.
On the target you need to create appropriate portals for each VLAN.

In this configuration, the target must send data to an IP address in the VLAN, which is only reachable via the "matching" network interface. This way you'd enforce balancing while reading.

@TMLKyza
Copy link
Author

TMLKyza commented Nov 30, 2024

"Read operations" means that the target chooses the interface on which it sends the data.

This is what I was thinking about too, however I can't really find anything in targetcli about balancing.

From my understanding of what you wrote the ideal setup would be to have literal tunnels for each interface both on the target and the initiator side. Is this because whenever my target sends a package to my initiator having all 4 interfaces on the same network means that i can choose whatever and there is no good reason to switch over ? could this be implemented in a similar fashion as you do with round-robin paths on the initiator side ?

Unless you have configured the target to bind the different sockets to specific interfaces, it is free to use always the same interface

Shouldn't in this case tho (assuming I knew how to do it) still be the same? I would need to have 4 IPs on my initiator too otherwise each NIC on my target cannot distinguish between path (i.e. 192.168.0.20X -> initiator.local)

I'd actually suggest that instead of 4 separate interfaces with different IP addresses, you use bonding or teaming on the target to establish "multipathing" on the IP level

This was my first idea even before I looked deeper into multipath, however general consensus seems that MPIO is better than bonding or LACP (I don't really understand why tho) for ISCSI. Still, if you are suggesting it i will test it out as soon as my new managed switch arrives.

@mwilck
Copy link
Contributor

mwilck commented Dec 2, 2024

From my understanding of what you wrote the ideal setup would be to have literal tunnels for each interface both on the target and the initiator side.

That's what I meant with the VLAN setup. The word "tunnel" is maybe sort of misleading because it's often used for VPNs or encrypted connections, but yes, the idea is correct.

could this be implemented in a similar fashion as you do with round-robin paths on the initiator side ?

The initiator cannot control the routing choices the target makes, unless it doesn't leave the target anything to choose, like in the dedicated VLAN setup I layed out above.

If the initiator has just one IP address, AFAICS, load balancing can't work, unless perhaps if you use bonding (not sure about that).

Shouldn't in this case tho (assuming I knew how to do it) still be the same? I would need to have 4 IPs on my initiator too otherwise each NIC on my target cannot distinguish between path (i.e. 192.168.0.20X -> initiator.local)

If the 4 IPs are all in the same subnet, you can't be sure that this will work as you expect. Actually I am unsure what your goal is. If there's just one interface, how could you expect a performance improvement from load balancing on the server side? At the end of the day, every packet needs to pass through that single interface on your client. That is of course also true with the VLAN setup that I proposed. But that VLAN setup would be suitable to simulate an environnment where true load balancing happens.

In my experience, multiple interfaces with different IP addresses on the same subnet are almost always a bad idea.

This was my first idea even before I looked deeper into multipath, however general consensus seems that MPIO is better than bonding or LACP (I don't really understand why tho) for ISCSI.

Who told you that? I generally agree, but it requires a correct networking setup, and I fear your current setup doesn't qualify as such.

I haven't experimented with bonding for some time. IIRC, with bonding, too, you need to be prepared for some surprises, unless you have a switch with proper 802.3ad support. Basically, bonding can control on which the target sends packets, but not on which it receives them. IOW, your read traffic may be balanced with bonding, but your write traffic may not be. The balance-alb mode provides some means to control RX traffic, but for a single client with just one IP address, it won't work.

Still, if you are suggesting it i will test it out as soon as my new managed switch arrives.

If that switch has 802.3ad support, you may be in luck.

However, I still fail to understand what your expectation is. The single interface on your client will be the bottleneck of your setup, regardless what you do on your server.

@TMLKyza
Copy link
Author

TMLKyza commented Dec 5, 2024

Sorry for the huge delay I was hoping for the switches to show up in the meantime, but I don't want to leave you on read.

The main goal behind all of this is to have 4Gbit bandwidth on my target (on 4 interfaces) such that my initiators that will use 2.5Gbit interfaces can actually benefit from them. You may think you can get the job done by adding a 2.5Gb nic to the target, but I don't want to occupy any PCIe slot as I will populate them with GPUs.

The new switches will support 802.3ad so I hope I can get it to work with bonding!

IOW, your read traffic may be balanced with bonding, but your write traffic may not be

Ironically this is what I would prefer balanced reads and unbalanced writes.

I know that this has totally spiralled out of topic, but I'll keep you updated.

@mwilck
Copy link
Contributor

mwilck commented Dec 5, 2024

Ironically this is what I would prefer balanced reads and unbalanced writes.

Then maybe bonding is just right for you.

If you want to proceed with multipath: What you said earlier is of course true, block-based multipath has some advantages over network-based load balancing. Your initial approach was just too simplistic. With iSCSI / TCP and multipath, you need to take extra measures to make sure that each block-level path uniquely maps to a network path. By default, this is not the case, in particular not if all interfaces are on the same subnet.

There are different ways to achieve this mapping from block level to network paths, the VLAN suggestion above is one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants