Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in locking mechanism #15

Open
mholt opened this issue Oct 29, 2020 · 8 comments
Open

Possible bug in locking mechanism #15

mholt opened this issue Oct 29, 2020 · 8 comments

Comments

@mholt
Copy link
Contributor

mholt commented Oct 29, 2020

Hello! We've received a few reports of certificates not renewing and they have similar symptoms:

  • DynamoDB is the storage module
  • Logs show that Caddy is trying to renew the certificate, but is waiting for a lock
  • The "LOCK-..." key for that certificate is present, but never deleted or considered stale
  • Even after deleting the LOCK- file, Caddy was still waiting for the lock (the Lock() method never returned)
  • Restarting Caddy after deleting the lock file manually resolved the issue

I am not familiar with DynamoDB but I felt it was important to raise the issue, as this is where my investigation has led me so far.

Are you able to reproduce this behavior at all?

/cc @liran-co and @JackEllis

@fillup
Copy link
Contributor

fillup commented Oct 29, 2020

Thanks for the report. I'll check into that tomorrow. At a minimum Dynamo should be deleting the lock keys as they expire, but perhaps either my example of how to configure the column for that is incorrect or the people reporting the problem left that part out. Anyway, I'll do some testing tomorrow and post an update when I have it.

@fillup
Copy link
Contributor

fillup commented Oct 30, 2020

@mholt I see that my dynamodb examples do not include information about using the TTL feature nor am I setting any kind of TTL value for lock records. I can add that as an optional feature, but I also added some tests to verify that it can create locks when an existing one already exists but is expired and that works fine. So I'm not sure how or why some folks are having issues with that.

The LOCK-... records have a value with the time they should be considered expired in RFC3339 format. Can we ask one of the people who is having the problem what the value of the record is? To make sure something crazy didn't happen with how far out the expiration is.

@JackEllis
Copy link

@fillup The only other idea I have is that TTL isn't guaranteed to prune. So if I set a TTL of say "in 5 mins", it could still be there for up to 24-48 hours (if i recall correctly).

I don't know Go so I can't fully follow the code but, when fetching a lock, do you also check the TTL value is not expired?

@mholt
Copy link
Contributor Author

mholt commented Oct 30, 2020

The default storage implementation (file system) uses "active locking" where a timestamp within the file is updated every couple of seconds, and so even if the lock file exists after about 5 seconds, it is safe to consider it stale and then clean it up after-the-fact. Is this something that would benefit DynamoDB too? (Keep in mind I have absolutely 0 experience with DDB).

@cmawhorter
Copy link

just added a pr that would hopefully fix this. i removed ttl entirely and switched to some ddb ways of doing things. i also added the active locking file storage uses.

unlike @Molt, i have a bunch of ddb experience but 0 experience with caddy and a little fuzzy on the inner workings of acme.

@JackEllis
Copy link

@cmawhorter Fascinating. Just want to say thank you for all the work you've done here. There are so many of us that are so grateful for this, Cory!

P.S. I'm in the same place. I'm comfortable with DDB but have no clue about the inner workings of Caddy. Luckily @mholt knows enough for all of us ;)

@mholt
Copy link
Contributor Author

mholt commented Nov 19, 2020

Fortunately, you don't need to know anything about Caddy or ACME to implement this properly; just adhere to the godoc description of the interface methods and as long as it works properly you're good to go.

@cmawhorter
Copy link

using the existing file storage in caddy as a reference was hugely helpful. most of the work was copy/paste from that.

yay, i was able to get things running locally against pebble (with remote ddb) and it seems to be working as expected. @fillup would probably know better though.

xcaddy build --with github.com/stampr/certmagic-storage-dynamodb/v2@stampr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants