Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We've seen some issues in production with locks not expiring #6

Open
barmstrong opened this issue Jan 30, 2014 · 10 comments
Open

We've seen some issues in production with locks not expiring #6

barmstrong opened this issue Jan 30, 2014 · 10 comments

Comments

@barmstrong
Copy link

Trying this as an alternative: https://github.com/mlanett/redis-lock

I haven't spent the time to look at the cause of how it happened, but wanted to comment on it.

@PatrickTulskie
Copy link
Owner

I'm not really clear on what you mean by "not expiring." The only way for a lock to not expire is if Redis isn't handling the expiration of keys correctly (the gem uses setnx). Please keep me posted on what you find. This is the first I'm hearing of such a thing.

@barmstrong
Copy link
Author

Sorry, I know it was a bad bug report - impossible to reproduce and no details. I'm not sure what happened other than some locks seemed to never expire. It worked for quite a long time correctly, so I think it must have been some rare event which triggered it.

Is there any situation (machine failure, kill -9 redis etc) where an owner of a lock disappears and it is left hanging around forever? The other gem I linked suggests "This uses setnx, but not the setnx algorithm described in the redis cookbook, which is not robust. It correctly handles timeouts and vanishing lock owners (such as machine failures)"

@PatrickTulskie
Copy link
Owner

No worries... a bug is a bug and I'm open to helping where I can.

So, I thought about that, but even still, Redis should be expiring the key since I'm also using setnx. I did build this around the Redis locking algorithm they suggested but I had to do a few things to handle failures and whatnot.

https://github.com/PatrickTulskie/redis-lock/blob/master/lib/redis/lock.rb#L38

It's been a long time since I wrote this, but the only thing I can think of that would be happening is Redis is not expiring the key. If you wanted to test this theory, you could call setnx on your Redis server and see if they key goes away after the time you set. Do you have a cluster of some sort? Maybe the key expiration isn't replicating?

I had a quick look at the other gem and it doesn't appear to be using setnx at all, which I find weird. It almost looks like it's storing a lock and timestamp in two different keys and doing a multi.get to read them both at the same time. I guess that would work.

I'll leave this open for now to see if we get any more attention on the issue.

@mkdynamic
Copy link

@PatrickTulskie You said Redis handles timeouts since you're using setnx, but setnx doesn't set any timeout on the key (http://redis.io/commands/setnx). I'm probably being dumb, what am I missing?

@PatrickTulskie
Copy link
Owner

@mkdynamic @barmstrong I just got 2 seconds to look at this again. I'm sorry. I wrote this a few years ago, so it's not entirely fresh in my head so what I said earlier in the issue might not make a whole lot of sense. Also, I primarily used it under REE 1.8.7 and Rails 2.3. At the time, Redis did not support EXPIRE or SETEX.

Basically the way it works with setnx is it tries to acquire the lock unless there is already a standing lock. If it can't acquire the lock with setnx, it gets the lock and checks the timeout value. If the timeout value is past due, it does a getset to acquire the lock and then checks to make sure that the value it got back was what it set. If so, lock acquired... if not then the lock might be extended by a few seconds for the previous holder.

What I'm thinking is happening here is:

  1. With Ruby 2.1 and Rails 3+ there is a bunch of stuff affecting the Time class so maybe something is happening with time zones and it's skewing the Time.now value.
  2. If you have extremely short timeouts it's possible that in the getset portion, it's pushing the lock out by a few seconds but ultimately not acquiring the lock. This could be the case if there is a lot of contention for the lock as well.

It's probably worth revisiting this and bringing it up to spec so it just uses setnx and then expire but if I recall correctly there was some caveat with that involving the response that setnx gets on expiring keys.

I've used this gem in environments with quite a bit of contention for locks but generally I leave the lock timeout at 60 seconds. I'd really love to fix this but all I have are my theories at this point. If you guys could give me a bit more data to go on or something that would be awesome. Or if you want to mess around with what I described above that would be cool too. I'm by no means married to this approach :)

@mkdynamic
Copy link

@PatrickTulskie Thanks so much for the detailed reply. I actually haven't experienced any issues with locks not expiring I just found this thread and was curious of the Redis expiration you mentioned since I couldn't see anywhere in the code that was using setex/expire. Your explanation cleared that up.

@ivoanjo
Copy link

ivoanjo commented Oct 1, 2015

Just to add, I've been able to reproduce this with the following example:

require 'redis'
require 'redis-lock'

$lock_key = 'test_lock_key'

redis = Redis.new
redis.del("lock:#{$lock_key}")
redis.set('counter', 0)

threads = []

50.times do |i|
  threads << Thread.new do
    begin
      redis = Redis.new

      sleep 1

      10000.times do |j|
        redis.lock_for_update($lock_key) do
          redis.set('counter', redis.get('counter').to_i + 1)
        end
        puts "Inc #{i} #{j}"
      end

      puts "#{i} done"
    rescue => e
      puts "Error in thread #{i}: #{e.inspect}"
    end
  end
end

threads.each do |t| t.join end
puts "Final result is #{redis.get('counter')}"

After a few seconds it hangs (waiting for a lock that should always be released). A get in redis-cli shows the lock still set, but obviously none of the threads should be holding it.

@PatrickTulskie
Copy link
Owner

@ivoanjo Are you getting any errors in there that would prevent the lock from being released? lock_for_update has an ensure block that releases the lock no matter what. That being said, if the thread dies then there's no way for it to ever release the lock.

There is support for locking/unlocking if the block technique doesn't work for you. It may make sense in your use case to unlock the key in your rescue block.

@ivoanjo
Copy link

ivoanjo commented Oct 2, 2015

I don't see any errors getting printed when the threads first get stuck because of a lock that wasn't unlocked. After a while with no thread being able to acquire the lock, I do see the RedisLockException, but those are expected.

I can also reproduce this without using the block-version:

        redis.lock($lock_key)
        begin
          redis.set('counter', redis.get('counter').to_i + 1)
        ensure
          redis.unlock($lock_key)
        end

And I still get the same behavior. Nevertheless, at least for our usecase @Talkdesk I saw these issues as I was preparing a migration to a gem that implements redlock, so I posted this more out of information-sharing than as a help request :)

@PatrickTulskie
Copy link
Owner

@ivoanjo Thanks for the heads up. That's actually really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants