Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO process is D+ when path_checker is tur and there some problem in backstores #54

Closed
lixiaokeng opened this issue Nov 9, 2022 · 16 comments · Fixed by #64
Closed

IO process is D+ when path_checker is tur and there some problem in backstores #54

lixiaokeng opened this issue Nov 9, 2022 · 16 comments · Fixed by #64
Labels
enhancement New feature or request

Comments

@lixiaokeng
Copy link
Contributor

Here is a test. We use tur path_checker and echo offline to backstores, then we write to multipath devices on client.
The IO process becomes D+ because request queue in kernel. We find the path will be down->up->down->up and no_path_retry doesn't work because the tur check is OK but IO is fail.

This problem can be solved if we use directio path_checker or set no_path_retry fail. However, we want to use tur path_checker when there are thousands of multiparty devices. Do you have a good idea about this problem?

@mwilck
Copy link
Contributor

mwilck commented Nov 9, 2022

I can't reproduce.

If I do echo offline >/sys/class/block/sdb/device/state, multipathd logs show:

Nov 09 08:26:08 luzifer multipathd[2919]: sda: tur state = up
Nov 09 08:26:12 luzifer multipathd[2919]: sdb: state down, checker not called
Nov 09 08:26:12 luzifer multipathd[2919]: 36001405130d0940e1914873a58afb4ad: sdb - path offline
Nov 09 08:26:17 luzifer multipathd[2919]: sdb: state down, checker not called
Nov 09 08:26:17 luzifer multipathd[2919]: 36001405130d0940e1914873a58afb4ad: sdb - path offline
Nov 09 08:26:22 luzifer multipathd[2919]: sdb: state down, checker not called
Nov 09 08:26:22 luzifer multipathd[2919]: 36001405130d0940e1914873a58afb4ad: sdb - path offline
Nov 09 08:26:27 luzifer multipathd[2919]: sdb: state down, checker not called
Nov 09 08:26:27 luzifer multipathd[2919]: 36001405130d0940e1914873a58afb4ad: sdb - path offline
Nov 09 08:26:28 luzifer multipathd[2919]: sda: tur state = up
Nov 09 08:26:32 luzifer multipathd[2919]: sdb: state down, checker not called
Nov 09 08:26:32 luzifer multipathd[2919]: 36001405130d0940e1914873a58afb4ad: sdb - path offline
Nov 09 08:26:37 luzifer multipathd[2919]: sdb: state down, checker not called

This is how it's expected to behave.

@lixiaokeng
Copy link
Contributor Author

lixiaokeng commented Nov 9, 2022

There is a mistake and I'm sorry for it. I echo offline in server but not in client.
I use targetcli to create lun0 and the real disk of lun0 is sdb in server. Make sdb offline.

@lixiaokeng
Copy link
Contributor Author

This is log in client.

Nov  4 17:51:27 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: performing delayed actions
Nov  4 17:51:27 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: reload [0 31457280 multipath 0 1 alua 2 1 service-time 0 1 1 8:32 1 service-time 0 1 1 8:16 1]
Nov  4 17:57:55 localhost multipathd[10826]: sdc: mark as failed
Nov  4 17:57:55 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 1
Nov  4 17:57:56 localhost multipathd[10826]: sdb: mark as failed
Nov  4 17:57:56 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 0
Nov  4 17:58:00 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: sdc - tur checker reports path is up
Nov  4 17:58:00 localhost multipathd[10826]: 8:32: reinstated
Nov  4 17:58:00 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 1
Nov  4 17:58:01 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: sdb - tur checker reports path is up
Nov  4 17:58:01 localhost multipathd[10826]: 8:16: reinstated
Nov  4 17:58:01 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 2
Nov  4 17:58:01 localhost multipathd[10826]: sdc: mark as failed
Nov  4 17:58:01 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 1
Nov  4 17:58:02 localhost multipathd[10826]: sdb: mark as failed
Nov  4 17:58:02 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 0
Nov  4 17:58:05 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: sdc - tur checker reports path is up
Nov  4 17:58:05 localhost multipathd[10826]: 8:32: reinstated
Nov  4 17:58:05 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 1
Nov  4 17:58:06 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: sdb - tur checker reports path is up
Nov  4 17:58:06 localhost multipathd[10826]: 8:16: reinstated
Nov  4 17:58:06 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 2
Nov  4 17:58:06 localhost multipathd[10826]: sdc: mark as failed
Nov  4 17:58:06 localhost multipathd[10826]: 3600140552e30369149849b69967aef8d: remaining active paths: 1
Nov  4 17:58:07 localhost multipathd[10826]: sdb: mark as failed

@mwilck
Copy link
Contributor

mwilck commented Nov 15, 2022

So this is iSCSI?

@bmarzins
Copy link
Contributor

I'm not sure that the solution to this problem is in multipath. The device is presumably really returning a good status to the TUR checker, even though it can't complete IO. I'm not sure what, short of basically copying the work of the directio checker, multipath could do to find out that this is the case. To work around the issue, the shaky path detection methods should be able to stop this sort of ping-ponging.

@mwilck
Copy link
Contributor

mwilck commented Nov 17, 2022

I agree. The TUR checker can only work if a GOOD response from TUR actually means that the device is able to handle I/O. That doesn't seem to be the case here.

@mwilck
Copy link
Contributor

mwilck commented Nov 17, 2022

What target is this?

@lixiaokeng
Copy link
Contributor Author

I use the target created by targetcli.
image

@mwilck
Copy link
Contributor

mwilck commented Nov 18, 2022

See target_core_spc.c:1305. LIO always reports GOOD status to TUR. You can't use the TUR checker with LIO. If you wish, this is a deficiency of the Linux target.

@mwilck
Copy link
Contributor

mwilck commented Nov 18, 2022

We should make an entry to our hwtable for this.

@mwilck mwilck added the enhancement New feature or request label Nov 18, 2022
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Mar 23, 2023
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Mar 23, 2023

Can you please review and test openSUSE@350af2c, which I've just pushed to https://github.com/openSUSE/multipath-tools/tree/tip ?

@lixiaokeng lixiaokeng changed the title IO process is D+ when path_checker is our and there some problem in backstores IO process is D+ when path_checker is tur and there some problem in backstores Mar 25, 2023
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Mar 28, 2023
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

Signed-off-by: Martin Wilck <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Mar 28, 2023
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

Signed-off-by: Martin Wilck <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Mar 28, 2023
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Mar 28, 2023
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
@lixiaokeng
Copy link
Contributor Author

There are some problem. We also should set ".detect_checker = DETECT_CHECKER_OFF" and ".product = "disk0"".
DETECT_CHECKER_OFF makes no detect_alua.

image

@mwilck
Copy link
Contributor

mwilck commented Mar 30, 2023

Right for .detect_checker (I keep making this mistake), but I don'd understand why we'd need disk0. The regexp we're using should match any product.

@lixiaokeng
Copy link
Contributor Author

It is OK. I test this in 0.8.7 and the .product is "RBD". It has been changed. There is no other question.

@mwilck
Copy link
Contributor

mwilck commented Mar 31, 2023

So, we need to add the .detect_checker line and we're good?

@lixiaokeng
Copy link
Contributor Author

Yes

bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this issue May 13, 2023
… SR 1086784

https://build.opensuse.org/request/show/1086784
by user mwilck + dimstar_suse
- Update to version 0.9.5+68+suse.d1b6a1c:
  Upstream bugfixes:
  * libmultipath: use directio checker for LIO targets
    (gh#opensvc/multipath-tools#54)
  * multipathd.service: remove "Also=multipathd.socket"
    (gh#opensvc/multipath-tools#65)
  * libmultipathd: Avoid parsing errors due to unsupported designators (forwarded request 1086780 from mwilck)
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Feb 27, 2024
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

mwilck: v2: fixed up with .detect_checker setting.

Reported-by: Li Xiaokeng <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Tested-by: Li Xiaokeng <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Feb 27, 2024
LIO always responds with GOOD status to TUR. Thus TUR is
not useful as path checker for LIO targets.

Fixes opensvc#54

mwilck: v2: fixed up with .detect_checker setting.

Reported-by: Li Xiaokeng <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Tested-by: Li Xiaokeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants