Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disposal of Lwt_pool.t elements #483

Merged
merged 5 commits into from
Oct 19, 2017

Conversation

hcarty
Copy link
Contributor

@hcarty hcarty commented Oct 14, 2017

An initial attempt at solving #480. Also tried to catch occurrences of thread in the doc comments and replace them with promise.

EDIT by @aantron: resolves #480.

@hcarty hcarty force-pushed the lets_dispose_of_pool_elements branch from 3f3068e to 43e0f33 Compare October 14, 2017 05:58
@hcarty
Copy link
Contributor Author

hcarty commented Oct 14, 2017

This implementation of clear does not invalidate pool elements which are currently in use once they have been released. I'm not sure if that's ok. I'd like to have clear invalidate all pool elements. One way to do that would be to wrap each element in a record with a mutable disposed field. If the field is set appropriately the element would be considered invalid without any further checks.

I'll hopefully have a chance early next week to update this PR with that improved functionality unless someone disagrees with the approach.

Copy link
Collaborator

@bobbypriam bobbypriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hcarty! I for one wishes this change to be merged. Left a note on the docs though 🙂

to the optional function [check] as [check element callback]. [check]
must call [callback] exactly once with [true] if [element] is still valid
and [false] otherwise. If [check] calls [callback false] then [dispose]
will be run on [element].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused on when dispose is called. At a glance, it seems that both validate and check will call dispose. Will dispose be called twice? If not, how is the order of calling done?

Also this:

If a call to {!use} fails with a pool element

As my comment in the issue, I think it's unclear why a use would fail. Does "fail" equals "trying to use an invalid pool element" (due to validate)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe "fail" means raising an exception or rejecting the 'b Lwt.t promise that use's callback returns. It should definitely be clarified in the comment, because I also am guessing about what it means.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that actually makes sense. I think it's the whole validate and invalid elements on use paragraph just above it that confused me, I thought they are still talking about the same (or at least related) thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to tweak the wording at bit - does this make the use of each argument any clearer? Suggestions are welcome - the wording has been tricky to get right.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new docs are pretty great!

@aantron
Copy link
Collaborator

aantron commented Oct 14, 2017

This implementation of clear does not invalidate pool elements which are currently in use

@bobbypriambodo Can you comment on this? What should clear do for pool elements that are currently in use?

@bobbypriam
Copy link
Collaborator

What should clear do for pool elements that are currently in use?

Intuitively for me clear should dispose all elements. However, I'm not sure what the best way is to achieve it.

The way node-pool achieve this is to first call drain() which will make sure that new acquire/use calls will be disallowed when the state is "draining" while letting remaining in-use elements to finish, and resolves when all elements are returned to the pool, at which we could call clear() safely. It also hits me that when we call clear, we might want to prevent new creations of elements, or else there might be some race conditions involved.

Or, another approach that I thought of is that to leave it as it is, and let the user bear the responsibility of making sure that the pool is definitely not used anymore before clearing. In case of a pool of DB connections that I have in mind, clear would be called when the app is shutting down, and it makes sense to first make sure that the app is not accessed before shutting it down (e.g. putting it out of load balancers, mark the instance unhealthy, etc.) to prevent end-users receive all kinds of errors.

@aantron
Copy link
Collaborator

aantron commented Oct 14, 2017

@bobbypriambodo thanks for your thoughts!

I guess that goes back to the question of whether we want a minimalistic Lwt_pool that is "agnostic" to different ways to address this, or whether there is a common behavior (which I don't know) for this aspect that users tend to expect, and we should implement that behavior.

@hcarty
Copy link
Contributor Author

hcarty commented Oct 14, 2017

I agree with @bobbypriambodo - I think it's reasonable to expect clear to invalidate and dispose of any existing pool elements once they are released.

I don't want to block usage of the pool on outstanding element use after a clear- when I get back to this I'll look for a clean way to handle that too.

@hcarty
Copy link
Contributor Author

hcarty commented Oct 14, 2017

And suggestions for approaches are of course welcome!

@bobbypriam
Copy link
Collaborator

@hcarty really love the new doc! It significantly clear things up. Now I understand the differences between check and validate.

Another minor note, there's an inconsistency, you defined [check element is_ok] but later If [check] calls [callback false]... If I understand correctly callback should be is_ok.

Note that pools are not for keeping Lwt threads. Lwt threads are very cheap
to create and are pure. It is neither desirable nor possible to reuse them.
If you want to have a pool of {e system} threads, consider module
Note that pools are not for keeping Lwt promises. Lwt promises are very
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph is a fortunate casualty of the change in terminology from threads to promises – as in, I doubt anyone would think of keeping a "promise pool" anyway :) I suggest deleting most of this paragraph, and leaving some modified version of the last sentence about using Lwt_preemptive for a system thread pool.

[Lwt_preemptive]. *)

type 'a t
(** Pools containing values of type ['a]. *)
(** Pools containing elements of type ['a]. *)

val create :
int ->
?check : ('a -> (bool -> unit) -> unit) ->
?validate : ('a -> bool Lwt.t) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but maybe we should also change the order so that ?validate comes before ?check? On each call to use, ?validate is called first, then ?check if the promise returned by ?use is rejected.

to the optional function [check] as [check element callback]. [check]
must call [callback] exactly once with [true] if [element] is still valid
and [false] otherwise. If [check] calls [callback false] then [dispose]
will be run on [element].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new docs are pretty great!

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

Thanks for the additional comments. I should get to this later this week, along with a potentially better approach to clear.

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

When check returns false the current approach is to drop the existing pool element (and with this PR, dispose it) and then create a new one to take its place. Would it be ok to change that functionality to not replace the element in the pool immediately after check returns false? If a new element is required to take its place it would still be created during a subsequent use. My reasoning behind this change is:

  1. Creating a new pool element may be costly. It seems better to save that cost until the element is required rather that paying it in the promise where it failed.
  2. The current remake-on-check-failure logic makes clear's semantics more difficult to understand as clear could be called on a pool but an outstanding use promise could end up creating a new pool element if it resolves to an exception.

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

I think the feedback on the doc comments have been addressed in the latest commit.

I've also updated clear so that any in-use pool elements will be disposed of once they are released. The logic there uses some ugly state tracking but hopefully the implementation is understandable. Please ask questions if it isn't.

Finally, I changed the logic around check so that a failed check does NOT create a new pool element. I think that this is better behavior in general - the only place we should create a new pool member is when we try to acquire one. This also makes the clear semantics and implementation much simpler.

@aantron
Copy link
Collaborator

aantron commented Oct 18, 2017

Would it be ok to change that functionality to not replace the element in the pool immediately after check returns false?

As maintainer, changing semantics makes me feel extremely paranoid, but I think this is ok. It may affect element creation timing for some users, but given Lwt is already a concurrency library, I would imagine nobody is able to count on the timing anyway.

in the promise where it failed

Sorry to nitpick language, but I have to do this as part of putting the thread terminology to rest. It's misleading to say that anything happens in a promise (compare "in a thread," which makes sense, but Lwt is no longer called a threading library exactly because that doesn't make sense for Lwt). The only things that can be said to be in a promise are either a list of callbacks (when pending) or a value or exception (when resolved). A more right thing to say here is something like "rather than paying it right after a promise returned by the callback of use is rejected," or anything else like that.

@@ -36,6 +36,8 @@ type 'a t = {
(* Validate old pool members. *)
dispose : 'a -> unit Lwt.t;
(* Dispose of a pool member. *)
mutable cleared : bool ref;
Copy link
Collaborator

@aantron aantron Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of indirection here is a bit sneaky, and the below syntax

p.cleared := true;
p.cleared <- ref false;

was mysterious to me when I first saw it. This

cleared = ref false;

also doesn't accurately reflect what's really going on. It's tempting to think that cleared involves only one level of indirection when seeing that.

I'd suggest making this an "immutable" field of type bool ref ref.

@aantron
Copy link
Collaborator

aantron commented Oct 18, 2017

The docs look pretty good. For me, I'll wait on reviewing the code in detail, as the build is currently failing. I think it's just the order of the labeled arguments that needs updating in the implementation.

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

A more right thing to say here is something like "rather than paying it right after a promise returned by the callback of use is rejected," or anything else like that.

I'm uncertain here - from a user's perspective it's use which pays the price of creating the new pool element. 'a Lwt.t may not be a thread but it still has a context it's responsible for, including its lifetime. I need to read your updated introduction again to make sure I have my terminology correct!

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

I'll wait on reviewing the code in detail, as the build is currently failing. I think it's just the order of the labeled arguments that needs updating in the implementation.

Oh no! Sorry about that, I pushed too soon. I'll get a fix in.

@aantron
Copy link
Collaborator

aantron commented Oct 18, 2017

from a user's perspective it's use which pays the price of creating the new pool element.

Quite true :)

'a Lwt.t may not be a thread but it still has a context it's responsible for, including its lifetime.

In a very loose sense there is a context, but it's a quite different kind of context than what is associated with a thread. That context is exactly a list of callbacks only (for a pending promise), and the promise is not "responsible" for it – it just has a list. I want to exactly avoid the notion that Lwt promises are similar to threads, including having contexts similar in any way one would ordinarily expect, so I'd rather avoid referring to both as "contexts" as well, as it's a loose use of terminology that could be confusing. The point is, promises are not "containers" for sequences of instructions, not units of execution, not something that runs, not something that can be interrupted, nothing "happens" in them. They are just smart references, that can hold lists of callbacks. Something else (a real thread, the main thread of the Lwt program) is responsible for running those callbacks at the right time, that main thread has a genuine thread context, etc. Promises are just inert data objects that are useful for synchronization and thinking about concurrency, and for getting a real thread to run things in a nice order.

@aantron
Copy link
Collaborator

aantron commented Oct 18, 2017

Or, put another way, promises are similar to streams or events in FRP, except they are one-shot. They are some kind of objects used for factoring your concurrent program, and you can register functions on them to create some kind of promise "network." Of course, I can't write that in the Lwt introduction :p

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

Working on that test failure now

@hcarty
Copy link
Contributor Author

hcarty commented Oct 18, 2017

Sorry for the noise from that failing test!

Latest changes:

  1. cleared is now a bool ref ref now as suggested by @aantron.
  2. Add an internal dispose helper function to help ensure count stays synced up
  3. Make the tests pass and add to the clear test to make sure even currently-in-use pool elements are disposed of once they are returned.
  4. Lift the disposal logic out of Lwt.catch in use to avoid potential double application of dispose if dispose fails.

@aantron aantron merged commit 9d7002d into ocsigen:master Oct 19, 2017
@aantron
Copy link
Collaborator

aantron commented Oct 19, 2017

Thanks @hcarty, @bobbypriambodo!

Me and @hcarty decided to merge this, skipping a super-thorough nitpicky review for now, but pushing that review to an overhaul (as recently done with lwt.ml/lwt.mli). It is to be done before 3.2.0. This is because much of the mental work is the same between reviewing this and overhauling Lwt_pool, we have to untangle some issues in the pre-existing code, and there is no point in doing this work twice, and forgetting the results after the first time. Since dispose/clear will not have been released yet, we can freely address any further issues in dispose/clear as part of the overhaul.

Any comments or reviews in this PR are still welcome, though!

@hcarty hcarty deleted the lets_dispose_of_pool_elements branch October 19, 2017 18:50
@aantron aantron mentioned this pull request Oct 20, 2017
@aantron
Copy link
Collaborator

aantron commented Oct 20, 2017

Finally, I changed the logic around check so that a failed check does NOT create a new pool element. I think that this is better behavior in general - the only place we should create a new pool member is when we try to acquire one. This also makes the clear semantics and implementation much simpler.

@hcarty, is it correct to say that this logic is still in the code after this PR? I do think I see it there..

@aantron
Copy link
Collaborator

aantron commented Oct 20, 2017

Never mind, sorry. This old comment

lwt/src/core/lwt_pool.ml

Lines 106 to 108 in 2b8e531

(* Verify a member is still valid before releasing it; try to replace
if it's invalid. *)
let check_elt p c =

is just 100% misleading. check_elt is called around validate (before use), not check (release after failed use).

@aantron
Copy link
Collaborator

aantron commented Oct 20, 2017

@hcarty, @bobbypriambodo what do you think about creating elements on an as-needed basis only, i.e. never re-creating inside either validate or check? We would create them only as a direct consequence of calling use.

I guess this would change the semantics even more, as the pool wouldn't try at all to maintain the number of elements it grew to. If there are five concurrent users, the pool will tend to stay around five elements in size. If the number of concurrent users decreases to two, and elements sometimes die in validate or check, the pool will tend to decrease to two elements automatically.

I don't know if there is some wisdom behind the current semantics. I was not able to find any documented anywhere. @bobbypriambodo, do you know what is done in popular pool implementations in other languages?

@aantron
Copy link
Collaborator

aantron commented Oct 20, 2017

This would also mean that if a pool element fails validate, we would start consuming pool elements in a loop until we find one that passes, or create a new one. The current behavior on failed validate is to leave the tail of the queue alone and create a new element right away. I'm not sure if one behavior is inherently more legitimate than the other.

@hcarty
Copy link
Contributor Author

hcarty commented Oct 20, 2017

@aantron My preference is to only create on use. Creating a new element can fail. From a pool user's point of view that failure would make the most sense when there is a need for a new element.

@aantron
Copy link
Collaborator

aantron commented Oct 20, 2017

Agreed. validate is still "on use," but I think only use itself should trigger creation.

As an example of weird behavior triggered by the current effort to maintain the element count, In the current code, if validate raises an exception, the pool checks for a waiting other promise, then tries to create an element, and fulfills/rejects that other promise depending on whether creation succeeded or also raised an exception. This was not documented, I wouldn't like to document it, and it suggests to me that even if the right thing to do is to create elements eagerly, it shouldn't be done/factored how it is now.

@aantron
Copy link
Collaborator

aantron commented Oct 20, 2017

In practice, the above means that if validate raises an exception, Lwt_pool only tries to create a new element if the pool is already still at the max limit (since that's why there is a waiting promise).

@bobbypriam
Copy link
Collaborator

do you know what is done in popular pool implementations in other languages?

Sorry but at the moment I have no idea, I was more of a user than implementer and most of the libs I used just worked so I haven't got the chance to dig into the impls. I can look it up though, and perhaps post it into the Lwt_pool overhaul issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: function to clear/destroy Lwt_pool
3 participants