Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inability of worker to lock/claim task --- infinite loop #48

Open
ekaram opened this issue Mar 2, 2016 · 23 comments
Open

Inability of worker to lock/claim task --- infinite loop #48

ekaram opened this issue Mar 2, 2016 · 23 comments

Comments

@ekaram
Copy link

ekaram commented Mar 2, 2016

We've been using firebase-queue for a while now. We saw some odd behavior in production last night and haven't been able to reproduce locally in development.

Our queue workers appeared to be in infinite loop of processing same queue task over and over again. I watched on the firebase dash as the same task turned yellow (claimed), and then green again (as if recreated from scratch) repeatedly.

I was able to resolve the issue by clearing the tasks from the queue and doing multiple server restarts of the code running node with the queue workers.

I watched the problem occur on two separate queues. The queue worker code for those queues are different, separate and has been stable.

If you have any ideas for what to look into or how to reproduce, please let me know

@peranderson
Copy link

I just started using Firebase queue yesterday with the latest code. The example posted where they use a setTimeout then call resolve() does not work. If you call resolve outside of the timeout, it removes the task from the queue. If you call resolve() inside of the timeout or any other async callback, it does not remove the task from the queue as it should and your worker will pick up that task again over and over. Essentially looping over the same task. This issue needs some attention or a workaround ASAP.

@peranderson
Copy link

This guy is running into the same issue. Nobody has responded to his question. I've searched all over for a solution. If he moves the call to resolve outside of the setTimeout the example will work. http://stackoverflow.com/questions/35750115/firebase-queue-triggering-several-times-i-dont-know-why

@cbraynor
Copy link
Contributor

Do you have a reproducible case? If so, that would help debugging immensely - I wasn't able to reproduce the error with the linked SO example on node 4.3.1

@correasebastian
Copy link

same problem here, i was trying change my nodejs version, and nothing works, any async operation inside the queue main function is not resolving, not even you example

`var Queue = require('firebase-queue'),
Firebase = require('firebase');

var ref = new Firebase('https://testingqueue.firebaseio.com/queue');
var queue = new Queue(ref, function(data, progress, resolve, reject) {
// Read and process task data
console.log(data);

// Do some work
progress(50);

// Finish the task asynchronously is not working
setTimeout(function() {
    resolve('ok');
}, 5000);

// using sync operations work fine, but no async
// resolve();

})`

this prorblem is beingn aorund for a while and we havent received any answer from firebase, im quite disappointed, because firebase its such an amazing tool

my enviroment

windos server 2012 r2
nodejs version : 4.2.1 and 5.10.1

package.json

{ "name": "testingqueue", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "start": "nodemon index.js", "test": "echo \"Error: no test specified\" && exit 1" }, "keywords": [], "author": "", "license": "ISC", "dependencies": { "firebase": "2.4.2", "firebase-queue": "1.3.0" } }

@tsemerad
Copy link

@correasebastian Do you have ".indexOn": "_state" specified in your security rules? If so, try removing it. This sounds like a sort of regression of #43. My queues were behaving erratically when the state was indexed, and removing the index fixed it. Granted, my queues weren't repeatedly processing the same task, but were just hung up. Even though #43 was apparently fixed, I've left the indexes off my queues still for now.

@CookieCookson
Copy link

I'm having the same issue here, I am sometimes getting the same queue task processed multiple times over and then on occasions it finishes and passes along to the next queue.

@ekaram
Copy link
Author

ekaram commented Jun 27, 2016

Unfortunately, I do not have timeouts set in my code, so that issue is not the root cause of what I have seen. I have only seen this issue occur one additional time in production, but when it does it has extremely severe consequences for us.

I do index on _state, and have had no hanging issues, so I have not had a reason to remove that.

@jclalala
Copy link

The issue happens to me as well. I have two machines running the same node version (v4.4.7) where one machine does NOT have this problem while the other can reproduce with ease.

The problem is that a single worker (I've setup my queue to run with just 1 worker) repeated triggers the same 'one' task while the job does nothing but the below:

var jobInstance = 1;
...
... function (data, progress, resolve, reject) {
console.log("started;" + jobInstance);
resolve();
console.log("ended;" + jobInstance);
jobInstance++;
};

Output for one single job:
started;1
ended;1
started;2
ended;2

I can provide full server / code access to anyone who's interested to tackle this.

@jclalala
Copy link

Ok checked firebase-queue src. So the problem appears to be Firebase (I'm on v2.4.2)... For reference, I'm using firebase-queue v1.3.1.

The problem happens in function QueueWorker.prototype._tryToProcess():398, where the function attempts to open a Firebase transaction on the task. In the updateFunction the code tries update its _state to inProgress. The onComplete callback of the transaction then invoked (line 463) BUT at this point the snapshot is NOT updated.

Although Firebase documentation (https://www.firebase.com/docs/web/api/firebase/transaction.html) indicates that the onComplete callback will have committed and the snapshot params of onComplete reflect what's been updated in the updateFunction, but in this case, occassionally, committed is true yet the snapshot is NOT updated.

I'll check to see if Firebase v3 with firebase-queue 1.4.1 will have this problem solved.

@CookieCookson
Copy link

@jclalala Thanks for taking the time to look into this issue, it's been bothering me for ages! I have migrated to Firebase v3 and seem to be still having the problem, can't remember if I upgraded firebase queue from 1.3.1 to 1.4.1 though.

@gvkhna
Copy link

gvkhna commented Jul 12, 2016

+1

@jclalala
Copy link

I forked the project (off firebase-queue v1.4.1 and Firebase v3.0.1) and implemented a small workaround. You may want to try it out here:

https://github.com/jclalala/firebase-queue

The workaround is based off a fact that I observed where only _state_changed seems to update (and not _owner, etc...). So when such condition happens we'd delete the item from the queue anyways whenever the processor's resolve() is called.

When the above happens, my forked mod will still remove the task item but if you enabled winston logs you will see an extra 'reset' debug log.

It seems that the code base heavily relies on on cross "transaction" dependencies. Meaning, if transaction A and transaction B happens chronologically, transaction B should reflect updates in transaction A... This assumption is unreliable in my tests. I suggest the authors to review this more in depth. For now the workaround works for me, not sure if there'd be other side effects yet but I'll keep this thread posted if I see any.

@gvkhna
Copy link

gvkhna commented Jul 14, 2016

Firebase v3.2.0 released just yesterday seems to have fixed the issue, it hasn't occurred since but more testing is probably needed to confirm.

@jclalala
Copy link

Problem still happens to me on latest version of firebase (3.2.0).

I just tried on the following versions:
"firebase": "^3.2.0",
"firebase-queue": "^1.4.0",

I'll still revert to my workaround :(

@maxtechera
Copy link

maxtechera commented Nov 9, 2016

Im having the same issue with versions:

Its working fine if I run it locally, but in the server the worker keeps running over and over.

Anyone had some luck with a fix?

--Edit

I checked firebase console and this is happening:

image

@donbarthel
Copy link

+1 This started happening to me today on my server. If I run the code (with node, firebase, firebase-queue) off my laptop instead, pointing to the same database, I'm not able to reproduce the problem.

I'm using:
"firebase": "^3.0.1",
"firebase-queue": "^1.4.0",

@donbarthel
Copy link

donbarthel commented Nov 11, 2016

More on this: jclalala fixed the issue (reported above) for himself changing this line in queue_worker.js from:

var expires = Math.max(0, startTime - now + self.taskTimeout);

to:

var expires = Math.max(10000, startTime - now + self.taskTimeout);

I have previously reported an issue (#45) which I solved by changing that same line to:

var expires = self.taskTimeout;

My change was implemented in a prior project but not this one and until today issue #45 didn't appear in this project. I just now reimplemented my change into my new project and, lo, this issue on my server has now disappeared! Not sure if mine (and jclalala's) changes actually fix the issue or if they just shuffle the code around sufficiently to avoid the issue.

Hope this info helps!

@Tyris
Copy link

Tyris commented Nov 18, 2016

Same issue as @crazymunky and @donbarthel.

Works fine on my dev machine (mac) but breaks on my Windows Server 2012 machine. That 10000 workaround seems to solve it.

We're using this for our mailer... so when it breaks we suddenly start spamming emails which isn't great.

edit It would appear the issues I was having were caused by our server time being about 2 minutes ahead of actual time (eg: firebase server time). If the server time doesn't match firebase (the closer the better) than this can reasonably happen.

@cbraynor
Copy link
Contributor

Release 1.6.1 contains a bugfix that could be related to this issue. Can you let me know if you're still having issues after updating

@calebhailey
Copy link

Not sure if this will help anyone else here, but I was seeing this exact behavior and determined the cause of the issue to be that my test code was taking longer (via a 10-second setTimeout()) than the timeout I defined for my spec, so the task was just falling into a retry loop.

@peepo3663
Copy link

I still have issue that tasks are not resolve and exist in my tasks
"firebase": "^3.6.3"
"firebase-queue": "^1.6.1"

@snowadamc
Copy link

Hey guys, I came along this conversation as I've recently been having this issue. I have been working with firebase queue for a while and was working on some big changes to our backend queue on a test database. When trying to roll this out for our production database today, I saw behavior much like this. After some experimenting, I found that it had to do with my database rules.

In one of my queue functions, I was attempting to read a portion of the database that I had forgot to change permissions to allow my backend worker to access. Instead of giving an error (or counting against the number of retries until the queue just moved on), it kept resetting the owner as others seemed to identify. Though I was eventually able to fix it, this seems like a database read fail shouldn't reset the queue, but cause it to fail and move on.

@WikipediaBrown
Copy link

WikipediaBrown commented Apr 17, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests