-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lock not propagated between instances #13
Comments
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/143703131 The labels on this github issue will be updated when the story is started. |
Please see the linux man page on flock:
|
sorry. In Linux kernels up to 2.6.11, flock() does not lock files over NFS (i.e., the scope of locks was limited to the local system). Instead, one could use fcntl(2) byte-range lock‐ |
Right you are--we did some testing here on our end comparing kernel mounts with the fuse mount package we use in order to introduce uid mapping Here's what we found:
So it seems that even for kernel mounts there is some discrepancy in the way that flock behaves, but nonetheless, our nfs client is not behaving as well as the kernel nfs libraries. We will investigate further. |
OT: why fuse? why not just a mount? even further, why not just an automount ? |
fuse-nfs allows us to do uid mapping. We could use the kernel mount, but then any buildpack application will connect to the nfs server as uid 2000 (the uid of the vcap user in the container) We wanted the ability for CF apps to be able to declare a uid and have that uid transmitted to the nfs server. This facilitates re-use of existing shares for app replatforming and gives us something sort of resembling security for shares used by multiple apps. |
We opened a corresponding issue in fuse-nfs, as the root cause is in the fuse-nfs library which does not implement the lock() or flock() methods of the fuse interface: |
By way of bigger picture we are curious what sorts of apps/use cases you are considering? Is there is a specific use case where a filesystem and filesystem locking is a better approach than some other alternative. This may help us with this feature and testing moving forwards. |
@paulcwarren In CF, apps can have more than one instances running, and all these instances will access the same nfs-share, there must be some mechanisms to prevent race conditions. |
and this issue is an defect, not a enhancement request. this issue will prevent our customers to adopt this component in their production environments. |
@ChaosEternal -- I think we understand the high level issue, but it would be helpful for us to have a bit more detail about your specific use case in order to prioritize this work against some of the other possible remedies we might consider. For example, if NFSv4 were supported, would you consider to use that over NFSv3? My impression is that locking is more cleanly integrated in a V4 context as it is incorporated into the NFS protocol itself rather than being implemented in a parallel protocol. OTOH, some folks still have issues with NFS4 reliability when networks are slow or flaky, which is maybe why it hasn't got better adoption. As another example, do you require locking because your apps must be scaled up in order to work in a production environment, or is this a requirement because CF today cannot guarantee that it will not start more than one instance of an application? In other words, if CF were to support more "pessimistic" application scheduling, and single-attach semantics for file volumes, would that meet the requirement? Or you require >1 instances for scalability to prod? These are fairly specific questions, but a general understanding of the use cases and environment you're working with would also be helpful for us. |
Nearly a year and a half later, this issue is finally resolved when using the The fix is available in nfs-volume-release v1.5.0. |
Thanks to @jandubois for pointing out that the comment above is a little inaccurate. The fix that makes locking work is actually in mapfs-release v1.1.0 |
The nfs mount does not propagate the locks on file between instances.
test:
on same instance:
run cmd: (flock 9 && echo sucess1; sleep 100) 9> $NFS_MOUNT_POINT/lockfile &
then run: (flock 9 && echo sucess2; sleep 100) 9> $NFS_MOUNT_POINT/lockfile &
will see: sucess1 and after 100 seconds, see sucess2.
the lock works on same instance.
on different instances:
instance 1:
run cmd: (flock 9 && echo sucess1; sleep 100) 9> $NFS_MOUNT_POINT/lockfile &
will see sucess1 immediately
then on instance 2:
run cmd: (flock 9 && echo sucess2; sleep 100) 9> $NFS_MOUNT_POINT/lockfile &
we still will see sucess2 immediately
that means flock does not propage to instance 2
in CF, all applications have multiple instances and they will mount the same volume, then these application wont have proper mechanisms to avoid competing write which will cause data corruption.
The text was updated successfully, but these errors were encountered: