Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to send SCSI Registration to anyone of LUNs after mapping a large number of extern LUNS. #45

Closed
boposki opened this issue Oct 12, 2022 · 1 comment · Fixed by #64

Comments

@boposki
Copy link

boposki commented Oct 12, 2022

Procedure:

Step1: 1024 LUNs (16 paths) for an external storage are mapping to the host, then run rescan-scsi-bus.sh to produce 1024 disk.
Step2: Manually issues a registration command to one of LUNs, receive timeout error.
mpathpersist -o -I -S 0x000000003320095c /dev/dm-117
But if I sent a registration command with sg_persist -o -I -S 0x000000003320095c /dev/dm-117, that was successful.
According the error log, I found that mpathpersist send msg of saving prkey to multipathd timeout when I config reservation_key:

defaults {
	path_checker            tur
	no_path_retry           18
	path_grouping_policy    group_by_prio
	prio                    const
	deferred_remove         yes
	uid_attribute           "ID_SERIAL"
	reassign_maps           no
	failback                immediate
	log_checker_err         once
	reservation_key         "file"  // this item
}

Root Cause: The recv package cannot be recievd after fixed 4 seconds timeout, because multipathd spent more than 4 seconds to excute PARSE, which triggers vector lock collision with checkerloop.

#define DEFAULT_REPLY_TIMEOUT	4000
static int do_update_pr(char *alias, char *arg)
{
        ......
	condlog (2, "%s: pr message=%s", alias, str);
	if (send_packet(fd, str) != 0) {
		condlog(2, "%s: message=%s send error=%d", alias, str, errno);
		mpath_disconnect(fd);
		return -1;
	}
	ret = recv_packet(fd, &reply, DEFAULT_REPLY_TIMEOUT);
	if (ret < 0) {
		condlog(2, "%s: message=%s recv error=%d", alias, str, errno);
		ret = -1;
	}
       ......
}

Solution Suggestion: Modify client timeout to uxsock_timeout value rather than DEFAULT_REPLY_TIMEOUT , that will be consistent with server, and that would make more sense: Client wait timeout should be more than Server excecution Timeout,

considering the transmission delay. After that, uxsock_timeout in /etc/multipath.conf can be modified to more than default value such as 10 seconds.

@boposki boposki changed the title Failed 桶SCSI Registration for single LUN after Failed to send SCSI Registration to anyone of LUNs after mapping a large number of extern LUNS. Oct 12, 2022
@mwilck
Copy link
Contributor

mwilck commented Oct 26, 2022

I don't understand.

multipathd spent more than 4 seconds to excute PARSE

what does this mean? What do you mean with PARSE, and how is it possible that it took 4 seconds?
Can you fix this by simply increasing the timeout?

Btw which multipath-tools version were you using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants