Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-24917: limit simultaneous connections #26

Merged
merged 1 commit into from
Oct 27, 2017

Conversation

thomassa
Copy link

No description provided.

@@ -21,3 +21,6 @@ let wait_for_xapi_timeout_seconds = 300.0

(** We sleep for this many seconds before the next login attempt. *)
let wait_for_xapi_retry_delay_seconds = 4.0

(** Allow no more than this many client connections. *)
let connection_limit = 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter logic looks good, but where does the limit 16 come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, whatever number we choose it's gonna be a bit arbitrary. @thomassa wanted to limit it to 127 originally. The important thing is that the maximum number of VBDs linked to dom0 is limited to 256, and xapi also needs to attach VBDs to dom0 to perform some operations. So we have to leave enough VBDs for dom0 & xapi to do its work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I do think that we could increase it to a larger number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be conservative and increase it in a second stage if it blocks partners. Let's ask some architects: @robhoes

Copy link
Author

@thomassa thomassa Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to know. And we could make it settable through a command-line option or a config file or something. My initial suggestion of 127 had very little thought behind it.

But what's the maxiumum number of connections that would make sense for a client to use? With 16 they will almost certainly be slowing each other down, and it would be probably be faster to read the blocks one VDI at a time... or maybe multiple VDIs but only one connection at a time using each PBD.

Copy link
Contributor

@gaborigloi gaborigloi Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the host could have multiple NICs though, and the VDIs may be in multiple different SRs, so that neither the network, nor the storage would be a bottleneck?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say let's make it 32 or 50?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's limit it to 16, as we discussed.

@thomassa
Copy link
Author

thomassa commented Oct 26, 2017

The Travis build failure was

docker run --env-file=env.list -v /home/travis/build/xapi-project/xapi-nbd:/repo local-build ci-opam
Fatal error: exception # opam-version    1.2.2 (58ef3b8213100953848d362f7120a30356d7f77d)
# os              linux
opam: "execvp" failed on ci-opam: No such file or directory

...which seems to be a problem with the environment rather than the change in this pullreq, but we should see about fixing it.

@mseri
Copy link
Collaborator

mseri commented Oct 26, 2017

Why is travis broken only with the base repo?

Copy link
Contributor

@gaborigloi gaborigloi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This is a very important fix that we should have in the next release. I had one minor comment, but that is subjective, so it's not necessary to change the code. I reckon it would be the best to test this live before merging?

let conn_m = Lwt_mutex.create () in
let inc_conn ?(i=1) () = Lwt_mutex.with_lock conn_m (fun () ->
conn_count := !conn_count + i;
if !conn_count > Consts.connection_limit && i > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would find it clearer if we created a separate function for dec_conn, instead of reusing the same code. Actually, I think the best would be to create a new wrapper with_limited_connections, that way all the related code would be kept together. But this is subjective though.

@mseri
Copy link
Collaborator

mseri commented Oct 26, 2017

The BASE_REPO error is due to a still open issue of ocaml-docker, we can ignore it for the moment

Copy link
Contributor

@gaborigloi gaborigloi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge it once it's been tested.

@gaborigloi
Copy link
Contributor

gaborigloi commented Oct 26, 2017

I have tested it by creating connections using xapi_nbd_client, and I can confirm that this works 😄.
However, for some reason the 17th client connection was not interrupted but just hung when I got the main: Caught exception while handling client: (Failure "Server busy: already at maximum 16 connections.") message on the server side. (This happened even when I tried to set a timeout with socket.settimeout.) But I guess there is nothing we can do here about that, because we do close the socket when we catch this exception:

Oct 26 20:13:06 localhost xapi-nbd[2772]: main: Caught exception while handling client: (Failure "Server busy: already at maximum 16 connections.")

So I'm going to merge it because this is an important fix, but it's not ideal though that the client just hangs. I guess it would still hang if instead of this method we just did not accept the incoming client connection?

@gaborigloi
Copy link
Contributor

gaborigloi commented Oct 26, 2017

After this is merged, we still have to update xapi-nbd.spec (or pin xapi-nbd) to get this fix in.

@mseri
Copy link
Collaborator

mseri commented Oct 27, 2017

Is there any way to send a message to close the connection to the client? I am not really convinced about the behaviour

@gaborigloi
Copy link
Contributor

I'll check how the nbd-client behaves. Maybe something is wrong with my Python client.

@gaborigloi
Copy link
Contributor

Hmm, interesting, when I use the nbd-client, everything works fine, and the connection fails instead of hanging when I exceed the limit:

# nbd-client -N '/982ac998-1570-475e-a648-3f83c9046f2e?session_id=OpaqueRef:91c751e3-57da-491f-b9d3-55b4508cc138' xrtuk-09-08.xenrt.citrite.net /dev/nbd0
Negotiation: Error: Read failed: End of file
Exiting.

🤔

@mseri
Copy link
Collaborator

mseri commented Oct 27, 2017

Although the client failure is a bit unreadable, I find this behaviour much more acceptable!

@gaborigloi
Copy link
Contributor

Should we merge this then?

@mseri mseri merged commit c2cf6d1 into xapi-project:master Oct 27, 2017
@gaborigloi
Copy link
Contributor

So apparently this hanging is client-dependent for some reason 🤔

@gaborigloi
Copy link
Contributor

The PR is fine, and it does cause the client to error out. The reason my Python NBD client hung is that I was using the recv call incorrectly 😳 - I did not check for zero data length, which means that the connection has been closed.
This has now been fixed:
XS-CBT-backup/xs-cbt-backup@2c839f5#diff-9012dbb4d6223130668b0fe0db3a228f
With the fixed version, it works as expected, the 17th connection errors out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants