Inability of worker to lock/claim task --- infinite loop #48

ekaram · 2016-03-02T18:48:00Z

We've been using firebase-queue for a while now. We saw some odd behavior in production last night and haven't been able to reproduce locally in development.

Our queue workers appeared to be in infinite loop of processing same queue task over and over again. I watched on the firebase dash as the same task turned yellow (claimed), and then green again (as if recreated from scratch) repeatedly.

I was able to resolve the issue by clearing the tasks from the queue and doing multiple server restarts of the code running node with the queue workers.

I watched the problem occur on two separate queues. The queue worker code for those queues are different, separate and has been stable.

If you have any ideas for what to look into or how to reproduce, please let me know

peranderson · 2016-03-25T18:39:41Z

I just started using Firebase queue yesterday with the latest code. The example posted where they use a setTimeout then call resolve() does not work. If you call resolve outside of the timeout, it removes the task from the queue. If you call resolve() inside of the timeout or any other async callback, it does not remove the task from the queue as it should and your worker will pick up that task again over and over. Essentially looping over the same task. This issue needs some attention or a workaround ASAP.

peranderson · 2016-03-25T18:44:27Z

This guy is running into the same issue. Nobody has responded to his question. I've searched all over for a solution. If he moves the call to resolve outside of the setTimeout the example will work. http://stackoverflow.com/questions/35750115/firebase-queue-triggering-several-times-i-dont-know-why

cbraynor · 2016-03-25T20:16:41Z

Do you have a reproducible case? If so, that would help debugging immensely - I wasn't able to reproduce the error with the linked SO example on node 4.3.1

correasebastian · 2016-04-11T15:24:45Z

same problem here, i was trying change my nodejs version, and nothing works, any async operation inside the queue main function is not resolving, not even you example

`var Queue = require('firebase-queue'),
Firebase = require('firebase');

var ref = new Firebase('https://testingqueue.firebaseio.com/queue');
var queue = new Queue(ref, function(data, progress, resolve, reject) {
// Read and process task data
console.log(data);

// Do some work
progress(50);

// Finish the task asynchronously is not working
setTimeout(function() {
    resolve('ok');
}, 5000);

// using sync operations work fine, but no async
// resolve();

})`

this prorblem is beingn aorund for a while and we havent received any answer from firebase, im quite disappointed, because firebase its such an amazing tool

my enviroment

windos server 2012 r2
nodejs version : 4.2.1 and 5.10.1

package.json

{ "name": "testingqueue", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "start": "nodemon index.js", "test": "echo \"Error: no test specified\" && exit 1" }, "keywords": [], "author": "", "license": "ISC", "dependencies": { "firebase": "2.4.2", "firebase-queue": "1.3.0" } }

tsemerad · 2016-04-27T20:52:44Z

@correasebastian Do you have ".indexOn": "_state" specified in your security rules? If so, try removing it. This sounds like a sort of regression of #43. My queues were behaving erratically when the state was indexed, and removing the index fixed it. Granted, my queues weren't repeatedly processing the same task, but were just hung up. Even though #43 was apparently fixed, I've left the indexes off my queues still for now.

CookieCookson · 2016-05-12T07:39:43Z

I'm having the same issue here, I am sometimes getting the same queue task processed multiple times over and then on occasions it finishes and passes along to the next queue.

ekaram · 2016-06-27T16:19:32Z

Unfortunately, I do not have timeouts set in my code, so that issue is not the root cause of what I have seen. I have only seen this issue occur one additional time in production, but when it does it has extremely severe consequences for us.

I do index on _state, and have had no hanging issues, so I have not had a reason to remove that.

jclalala · 2016-07-11T14:23:16Z

The issue happens to me as well. I have two machines running the same node version (v4.4.7) where one machine does NOT have this problem while the other can reproduce with ease.

The problem is that a single worker (I've setup my queue to run with just 1 worker) repeated triggers the same 'one' task while the job does nothing but the below:

var jobInstance = 1;
...
... function (data, progress, resolve, reject) {
console.log("started;" + jobInstance);
resolve();
console.log("ended;" + jobInstance);
jobInstance++;
};

Output for one single job:
started;1
ended;1
started;2
ended;2

I can provide full server / code access to anyone who's interested to tackle this.

jclalala · 2016-07-12T02:12:21Z

Ok checked firebase-queue src. So the problem appears to be Firebase (I'm on v2.4.2)... For reference, I'm using firebase-queue v1.3.1.

The problem happens in function QueueWorker.prototype._tryToProcess():398, where the function attempts to open a Firebase transaction on the task. In the updateFunction the code tries update its _state to inProgress. The onComplete callback of the transaction then invoked (line 463) BUT at this point the snapshot is NOT updated.

Although Firebase documentation (https://www.firebase.com/docs/web/api/firebase/transaction.html) indicates that the onComplete callback will have committed and the snapshot params of onComplete reflect what's been updated in the updateFunction, but in this case, occassionally, committed is true yet the snapshot is NOT updated.

I'll check to see if Firebase v3 with firebase-queue 1.4.1 will have this problem solved.

CookieCookson · 2016-07-12T07:54:57Z

@jclalala Thanks for taking the time to look into this issue, it's been bothering me for ages! I have migrated to Firebase v3 and seem to be still having the problem, can't remember if I upgraded firebase queue from 1.3.1 to 1.4.1 though.

gvkhna · 2016-07-12T21:41:28Z

+1

jclalala · 2016-07-13T15:55:35Z

I forked the project (off firebase-queue v1.4.1 and Firebase v3.0.1) and implemented a small workaround. You may want to try it out here:

https://github.com/jclalala/firebase-queue

The workaround is based off a fact that I observed where only _state_changed seems to update (and not _owner, etc...). So when such condition happens we'd delete the item from the queue anyways whenever the processor's resolve() is called.

When the above happens, my forked mod will still remove the task item but if you enabled winston logs you will see an extra 'reset' debug log.

It seems that the code base heavily relies on on cross "transaction" dependencies. Meaning, if transaction A and transaction B happens chronologically, transaction B should reflect updates in transaction A... This assumption is unreliable in my tests. I suggest the authors to review this more in depth. For now the workaround works for me, not sure if there'd be other side effects yet but I'll keep this thread posted if I see any.

gvkhna · 2016-07-14T00:24:41Z

Firebase v3.2.0 released just yesterday seems to have fixed the issue, it hasn't occurred since but more testing is probably needed to confirm.

jclalala · 2016-07-14T01:31:30Z

Problem still happens to me on latest version of firebase (3.2.0).

I just tried on the following versions:
"firebase": "^3.2.0",
"firebase-queue": "^1.4.0",

I'll still revert to my workaround :(

maxtechera · 2016-11-09T18:11:58Z

Im having the same issue with versions:

Its working fine if I run it locally, but in the server the worker keeps running over and over.

Anyone had some luck with a fix?

--Edit

I checked firebase console and this is happening:

donbarthel · 2016-11-11T07:45:29Z

+1 This started happening to me today on my server. If I run the code (with node, firebase, firebase-queue) off my laptop instead, pointing to the same database, I'm not able to reproduce the problem.

I'm using:
"firebase": "^3.0.1",
"firebase-queue": "^1.4.0",

donbarthel · 2016-11-11T08:07:09Z

More on this: jclalala fixed the issue (reported above) for himself changing this line in queue_worker.js from:

var expires = Math.max(0, startTime - now + self.taskTimeout);

to:

var expires = Math.max(10000, startTime - now + self.taskTimeout);

I have previously reported an issue (#45) which I solved by changing that same line to:

var expires = self.taskTimeout;

My change was implemented in a prior project but not this one and until today issue #45 didn't appear in this project. I just now reimplemented my change into my new project and, lo, this issue on my server has now disappeared! Not sure if mine (and jclalala's) changes actually fix the issue or if they just shuffle the code around sufficiently to avoid the issue.

Hope this info helps!

Tyris · 2016-11-18T07:49:29Z

Same issue as @crazymunky and @donbarthel.

Works fine on my dev machine (mac) but breaks on my Windows Server 2012 machine. That 10000 workaround seems to solve it.

We're using this for our mailer... so when it breaks we suddenly start spamming emails which isn't great.

edit It would appear the issues I was having were caused by our server time being about 2 minutes ahead of actual time (eg: firebase server time). If the server time doesn't match firebase (the closer the better) than this can reasonably happen.

cbraynor · 2016-12-21T23:36:49Z

Release 1.6.1 contains a bugfix that could be related to this issue. Can you let me know if you're still having issues after updating

calebhailey · 2017-01-09T23:53:33Z

Not sure if this will help anyone else here, but I was seeing this exact behavior and determined the cause of the issue to be that my test code was taking longer (via a 10-second setTimeout()) than the timeout I defined for my spec, so the task was just falling into a retry loop.

peepo3663 · 2017-01-27T06:40:30Z

I still have issue that tasks are not resolve and exist in my tasks
"firebase": "^3.6.3"
"firebase-queue": "^1.6.1"

snowadamc · 2017-04-17T22:13:42Z

Hey guys, I came along this conversation as I've recently been having this issue. I have been working with firebase queue for a while and was working on some big changes to our backend queue on a test database. When trying to roll this out for our production database today, I saw behavior much like this. After some experimenting, I found that it had to do with my database rules.

In one of my queue functions, I was attempting to read a portion of the database that I had forgot to change permissions to allow my backend worker to access. Instead of giving an error (or counting against the number of retries until the queue just moved on), it kept resetting the owner as others seemed to identify. Though I was eventually able to fix it, this seems like a database read fail shouldn't reset the queue, but cause it to fail and move on.

WikipediaBrown · 2017-04-17T22:40:50Z

Are you using Firebase-queue in an AppEngine instance or a managed VM/container situation? -Perris

…

On Apr 17, 2017, at 3:13 PM, snowadamc ***@***.***> wrote: Hey guys, I came along this conversation as I've recently been having this issue. I have been working with firebase queue for a while and was working on some big changes to our backend queue on a test database. When trying to roll this out for our production database today, I saw behavior much like this. After some experimenting, I found that it had to do with my database rules. In one of my queue functions, I was attempting to read a portion of the database that I had forgot to change permissions to allow my backend worker to access. Instead of giving an error (or counting against the number of retries until the queue just moved on), it kept resetting the owner as others seemed to identify. Though I was eventually able to fix it, this seems like a database read fail shouldn't reset the queue, but cause it to fail and move on. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#48 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMsNqzSE4WU62raRN6j1UQA62MR8nAGyks5rw-QYgaJpZM4Hnuu4>.

ksinghal mentioned this issue May 26, 2016

Tasks Not Disappearing #67

Open

donbarthel mentioned this issue Nov 11, 2016

Task in queue expires immediately #45

Open

cbraynor mentioned this issue Dec 20, 2016

Fixing a bug where all workers would reset a task on timeout #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inability of worker to lock/claim task --- infinite loop #48

Inability of worker to lock/claim task --- infinite loop #48

ekaram commented Mar 2, 2016

peranderson commented Mar 25, 2016

peranderson commented Mar 25, 2016

cbraynor commented Mar 25, 2016

correasebastian commented Apr 11, 2016

tsemerad commented Apr 27, 2016

CookieCookson commented May 12, 2016

ekaram commented Jun 27, 2016

jclalala commented Jul 11, 2016

jclalala commented Jul 12, 2016

CookieCookson commented Jul 12, 2016

gvkhna commented Jul 12, 2016

jclalala commented Jul 13, 2016

gvkhna commented Jul 14, 2016

jclalala commented Jul 14, 2016

maxtechera commented Nov 9, 2016 •

edited

Loading

donbarthel commented Nov 11, 2016

donbarthel commented Nov 11, 2016 •

edited

Loading

Tyris commented Nov 18, 2016 •

edited

Loading

cbraynor commented Dec 21, 2016

calebhailey commented Jan 9, 2017

peepo3663 commented Jan 27, 2017

snowadamc commented Apr 17, 2017

WikipediaBrown commented Apr 17, 2017 via email

Inability of worker to lock/claim task --- infinite loop #48

Inability of worker to lock/claim task --- infinite loop #48

Comments

ekaram commented Mar 2, 2016

peranderson commented Mar 25, 2016

peranderson commented Mar 25, 2016

cbraynor commented Mar 25, 2016

correasebastian commented Apr 11, 2016

tsemerad commented Apr 27, 2016

CookieCookson commented May 12, 2016

ekaram commented Jun 27, 2016

jclalala commented Jul 11, 2016

jclalala commented Jul 12, 2016

CookieCookson commented Jul 12, 2016

gvkhna commented Jul 12, 2016

jclalala commented Jul 13, 2016

gvkhna commented Jul 14, 2016

jclalala commented Jul 14, 2016

maxtechera commented Nov 9, 2016 • edited Loading

donbarthel commented Nov 11, 2016

donbarthel commented Nov 11, 2016 • edited Loading

Tyris commented Nov 18, 2016 • edited Loading

cbraynor commented Dec 21, 2016

calebhailey commented Jan 9, 2017

peepo3663 commented Jan 27, 2017

snowadamc commented Apr 17, 2017

WikipediaBrown commented Apr 17, 2017 via email

maxtechera commented Nov 9, 2016 •

edited

Loading

donbarthel commented Nov 11, 2016 •

edited

Loading

Tyris commented Nov 18, 2016 •

edited

Loading