Lost mutex for key ... #9

kirstyannepollock · 2020-02-18T14:18:47Z

I am getting 'Lost mutex for key xxx' when using 2 seperate Mutexes on 2 seperate services as part of my moleculer-based app. Just wondering what sort of thing I should be looking for that might cause this? I can't think right now how to construct a minimal example that replicates this behaviour, sorry, Just a few clues would be cool, especially since they are coming up as UnhandledPromiseRejectionWarning directly from RedisMutex._refresh with no stack trace.

The text was updated successfully, but these errors were encountered:

kirstyannepollock · 2020-02-18T16:40:08Z

aha, while debugging in VSCode, any breakpoint being hit or pausing of the node service code causes it.

swarthy · 2020-02-19T07:49:38Z

Hi! "Lost mutex/semaphore ..." thrown in service1 when service2 "steals" the lock (as your service1 thinks). This can happen when service1 doesn't update the lock. Normally it happens every refreshInterval ms, but if service1 is on huge CPU load or paused by breakpoint in debug mode or anything else what blocks Nodejs event loop, then after lockTimeout ms Redis deletes the key and service2 can legaly get this lock. After that, when service1 "unfreezes" and trying to refresh lock it got this error, cause lock is not belongs to service1 anymore. So I guess you should find what part of your service1 blocks event loop.

kirstyannepollock · 2020-02-19T12:54:24Z

Thanks. That explains it, and I have put in mitigation. I am doing some test code now, and I am trying to work out how to set up my test code to catch the 'Lost mutex for key xxx', but it seems to always fall outside any error handling. Test I have so far:

it('acquire twice same node base lib expiry test', async () => {
   let acquire
   try {
     const options = {
       lockTimeout: 1000,
       acquireTimeout: 1000,
       retryInterval: 10,
       refreshInterval: 1100 // so they will actually expire and not get refreshed
     }
     const mutexObject =
     {
       mutexes: {
         channelSync: new Mutex(redisClient, 'channelSync', options)
       }
     }

     const sleep = async (ms) => {
       return new Promise(resolve => {
         setTimeout(resolve, ms)
       })
     }

     await mutexObject.mutexes.channelSync.acquire()
     await sleep(2000)
   } catch (err) {
     //***** we never get here *****
     expect(err).toBeTruthy()
     expect(err.message).toBe('Lost mutex for key channelSync')

     // acquire = await mutexObject.mutexes.channelSync.acquire()
   }
 })

How do I need to set it up so that I can detect and handle this error?

swarthy · 2020-02-20T04:13:47Z

Unfortunately, you can't catch this error this way because it was designed to be fatal (it's throwns in setTimeout callback). When key is lost, then looks like there are errors in configuration (like your example with refreshInterval > lockTimeout) or problems in code architecture (when event loop is blocked by infinite while(true) { ... } for example), in both cases you will want to fix configuration/code, not to handle. In case of distributed system it's better to get fatal error (setup process.on('unhandledRejection', cb) or run nodejs with --unhandled-rejections=strict) here in service1 (and get notification from monitoring system), than handle unexpected behaviour because both service1 and service2 continues work with critical resource simultaneously.

kirstyannepollock · 2020-02-20T07:59:44Z

In my nodejs service code I have set up the process.on('unhandledRejection' handler) and written code mitigations in the services, there all is fine (I think). I can/should probably improve that to raise an event over the service bus to inform the services. Our services should really NOT crash our backend in the use case I have, and I think I have that covered. But I do want to write jest tests to prove that if (e.g.) one k8s node crashes, that the mutex is released and that the relevant service on other nodes can then get the mutex. I suppose I can put a process.on('unhandledRejection' in the jest code. Do I understand you/the code right, and the refresh, when set up with the correct params, means that the lock should be held by the service as long as the service keeps running, or until released? (or somehow pre-empted, as can happen when I pause the code in debugging).

swarthy · 2020-02-20T08:24:41Z

Yep, exactly! The idea (borrowed from book about redis) is to try to acquire lock (or throw error after acquireTimeout if lock is still held by other service) and refresh it every refreshInterval ms until .release() called. If service1 dies, then Redis deletes key (lockTimeout = ttl in Redis) and service2 can acquire this mutex.

kirstyannepollock · 2020-02-20T08:42:47Z

Yeah, great book, only know about it because it was linked from your readme- thanks :-). Sadly, I it seems will not be able to write a jest test due to a "bug" in jest where process is not the real one... jestjs/jest#5146 (comment) Thanks for your help!

swarthy · 2020-02-20T09:00:49Z

You welcome!

swarthy closed this as completed Feb 20, 2020

ianmuninio mentioned this issue Nov 24, 2020

Problem with AWS Lambda #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lost mutex for key ... #9

Lost mutex for key ... #9

kirstyannepollock commented Feb 18, 2020 •

edited

Loading

kirstyannepollock commented Feb 18, 2020

swarthy commented Feb 19, 2020

kirstyannepollock commented Feb 19, 2020 •

edited

Loading

swarthy commented Feb 20, 2020 •

edited

Loading

kirstyannepollock commented Feb 20, 2020

swarthy commented Feb 20, 2020

kirstyannepollock commented Feb 20, 2020 •

edited

Loading

swarthy commented Feb 20, 2020

Lost mutex for key ... #9

Lost mutex for key ... #9

Comments

kirstyannepollock commented Feb 18, 2020 • edited Loading

kirstyannepollock commented Feb 18, 2020

swarthy commented Feb 19, 2020

kirstyannepollock commented Feb 19, 2020 • edited Loading

swarthy commented Feb 20, 2020 • edited Loading

kirstyannepollock commented Feb 20, 2020

swarthy commented Feb 20, 2020

kirstyannepollock commented Feb 20, 2020 • edited Loading

swarthy commented Feb 20, 2020

kirstyannepollock commented Feb 18, 2020 •

edited

Loading

kirstyannepollock commented Feb 19, 2020 •

edited

Loading

swarthy commented Feb 20, 2020 •

edited

Loading

kirstyannepollock commented Feb 20, 2020 •

edited

Loading