Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observed a significant delay when we are creating a file on S3 mount point and then checking its existence #1187

Open
hemant-gairola opened this issue Dec 6, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@hemant-gairola
Copy link

Mountpoint for Amazon S3 version

mount-s3 1.8.0

AWS Region

us-west-2

Describe the running environment

Running EC2 on Amazon Linux using IAM roles for an S3 bucket. We are using S3 bucket as a mount point with the help of CSI driver 1.8.0.
Kubernetes version (use kubectl version):

  • Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.15",
    Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.15",
    Driver version:
    1.8.1

Mountpoint options

Mount Options:
` accessModes:
- ReadWriteMany # supported options: ReadWriteMany / ReadOnlyMany
mountOptions:
- uid=65436
- gid=50
- allow-other
- allow-delete

What happened?

This could be related to #1038
We are using AWS S3 storage using the CSI driver. I created a small script that creates a file and writes data into it. We check the existence of this file and then attempt to create and write it.
The same script is working fine with the normal file system and NFS storage, but with AWS S3, we are observing a delay between file creation and its existence.
Please let me know if this delay is expected, or if we can set some parameters to mitigate the delay between a file creating and checking its existence.

Issue is very easy to reporoduce.

import os
import time
import threading


def test_file_creation_thread_ops(test_file, count):
    if not os.path.exists(test_file):
        with open(test_file, 'wb') as f:
            f.write(data)
        f.close()
    else:
        print(f"File {test_file} already exist for {count}!")

data = b'\x21'
# Path /etc/hyperscale/hs-staging-area is using AWS S3 storage
test_file="/etc/hyperscale/hs-staging-area/file.txt"
if os.path.exists(test_file):
    os.remove(test_file)
count=0
while count < 2:
    t1=threading.Thread(target=test_file_creation_thread_ops, args=(test_file,count))
    t1.start()
    count = count + 1

Relevant log output

Thanks
Thanks
Exception in thread Thread-2 (test_file_creation_thread_ops):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/app/create_file_4.py", line 9, in test_file_creation_thread_ops
    with open(test_file, 'wb') as f:
         ^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 1] Operation not permitted: '/etc/hyperscale/hs-staging-area/file.txt'
@hemant-gairola hemant-gairola added the bug Something isn't working label Dec 6, 2024
@hemant-gairola
Copy link
Author

Can someone look into it?

@vladem
Copy link
Contributor

vladem commented Dec 12, 2024

Hi, thanks for opening the issue. Let me confirm the core of the problem. Is that right, that you're expecting to detect the existence of the file in the beginning of the second thread with os.path.exists(test_file) but it actually happens later upon opening the file with open(test_file, 'wb')?

The output that you provide for the script is expected. The existence check with os.path.exists in the first thread triggers a HeadObject request to S3. The time it takes to finish this request is measured in milliseconds and will likely be more than the time that it takes to spawn a thread, so the os.path.exists check in the second thread will trigger a new request to S3. Both requests will return 404 and the file will be considered non-existent.

Is there a reason not to catch the PermissionError on open in your code and treat it as a signal of file existence? This behaviour is also described in TROUBLESHOOTING.md.

@hemant-gairola
Copy link
Author

@vladem Thanks for confirming it. Yes, we can catch the PermissionError, I was looking if there is any config parameter through which we can make the behaviour consistent with nfs.

@dannycjones
Copy link
Contributor

dannycjones commented Dec 16, 2024

There's no Mountpoint config parameter to remove this issue entirely where there are multiple writes trying to open the same file, this is simply caused by the higher latency for metadata operations when talking to S3 when compared to other file system offerings.

You could consider enabling metadata caching in Mountpoint which could reduce latencies for metadata lookups, however you should be aware of potential downsides which depend on your use case. You can learn more about metadata caching here: https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#caching-configuration

@dannycjones
Copy link
Contributor

dannycjones commented Dec 16, 2024

As a further point to Vlad's suggestion on how to avoid the issue, you can use the 'x' open flag instead of 'w' with Python to exclusively create the file and return an error if it could not be created. This would avoid the first round trip to S3 to check for file existence. https://docs.python.org/3/library/functions.html#open

For example, we might redefine the function as follows:

def test_file_creation_thread_ops(test_file, count):
    try:
        with open(test_file, 'xb') as f:
            f.write(data)
    except FileExistsError:
        print(f"File {test_file} already exist for {count}!")

If you want to exit from the current block if the file cannot be created, we'd recommend using this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants