-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isolator should invoke potentially blocking operations async from module API handlers #92
Comments
I looked at the Marathon code and I agree that this is a good idea and should be feasible. Thanks for the input. |
To add to @jdef 's description, this problem is pretty severe. If any operation in dvdi module blocks, ALL subsequent container launch/update/destroy will be BLOCKED, irrespective of whether the container is using external volume or not. Fixing that might involve serializing dvdcli operations. This is because when you use Subprocess, the order in which dvdcli operations are executed is non-deterministic. For instance, say you have a volume you want to umount first and then a new container coming requesting the same volume. You expect that the volume will be mounted for the new container. However, due to the race, it's likely that the umount happens later than the mount. |
@jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there. Maybe I am misunderstanding how to use subprocess and what its capabilities are. |
@jieyu how did we handle this scenario with the docker volume isolator On Wed, May 18, 2016 at 11:21 AM, David vonThenen [email protected]
James DeFelice |
related to #88, if calls to
os::shell
to executedvdcli
hang or block for significant amounts of time then the task launch pipeline breaks down and tasks become stuck inSTAGING
. part of the reason why this happens is because the isolator module invokes potentially blocking operations synchronously from within the mesos module API handlers.a better approach would be to invoke such commands asynchronously. perhaps by using, for example, Subprocess. HDFS code in Mesos provides an example of this approach: https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/hdfs/hdfs.cpp#L53
The text was updated successfully, but these errors were encountered: