Rare one off jobs & dynamic scheduled jobs #17

ErisDS · 2020-07-29T17:26:38Z

I was really excited to see this library pop up - it's awesome to see something using native worker threads and not requiring redis/mongo/some other store. But was then a bit confused by the configuration method when it came to trying it out.

I have use cases for different types of jobs:

recurring jobs that operate like a cron
one off, super rare, but long-running tasks that might never be run in the lifetime of the process
jobs that are generated dynamically and need to be run at a specific time

1 seems to be the main use case for Bree, but in my case the use case is smallest
2 is the my main use case - I can sort of see how I might manage it by not calling bree.start() and only calling bree.run('task') if the long running task is needed, but that feels like I'm not using the tool properly
3 is a nice to have - already have code doing this, but unless I'm missing something there's no way to achieve it with Bree - except approximation with a cron running every minute to check

Given the rareness of the one-off jobs, it's a shame to have to declare them upfront, rather than being able to add them if and when they show up - otherwise there's overhead for no good reason.

With type 3 specifically, I find it odd that Bree has support for setting an exact date when a job should run, but that can only be set on instantiation?

I guess I'm looking for bree.add(({jobConfig}) or bree.run({jobConfig}) - or am I massively missing something?

The text was updated successfully, but these errors were encountered:

niftylettuce · 2020-08-01T07:36:04Z

Hi @ErisDS and thanks writing in! 👋 🎉

I really want to see these cases be handled by Bree too (I have them for myself on some projects), and I'm still debating on the best approach for this. One approach would be to add IPC, pub/sub, or have a Bree job that queries for these one off jobs on a particular interval if real-time is not required, and in turn you could have Bree inside of Bree basically (though you may just want to use the modules that Bree internally uses to some degree if you don't want the overhead of Bree itself - which isn't much though).

One of the main reasons why I wrote Bree was because of the bad practices (other libraries) led developers towards. I think abstracting jobs as needed into barebones approach where you are in control of everything, and each job is completely independent and as lightweight as possible - is the best approach.

For (2) and (3), you could achieve these with pub/sub inside a long running Bree task or some other approach that isn't real time (like you said, a query every so often, e.g. one minute). You could then spawn Bree inside of this to fire off these jobs as needed.

I am totally open to your thoughts on what an API for this might look like with Bree, or if you do find an approach that works, if you'd like to share it and we can add it to the README (or an abstraction of it, e.g. if it is confidential). Very open to contributors, PR's, and help from the community!

niftylettuce · 2020-08-01T07:40:12Z

One other thing I didn't mention is that Bree exposes all of its config, workers, etc. For example, someone added a new job to a Bree config after it already started by adding to bree.config.jobs, whereas const bree = new Bree(...) (e.g. #10) or they listened for a worker and then communicated over Comlink once it had started (e.g. https://github.com/uvcat/uvcat/blob/fb4139b40ceced5c1ac4219588d78ddee2f3fa2b/packages/%40uvcat/plugin-worker/index.js#L29-L31).

ErisDS · 2020-08-04T14:35:53Z

As per #19 using the config object would skip validation etc, and feels very much like a hack 😬

Where I am at is assessing various libraries looking for the right fit. I'm super excited by what this library promises and wanted to share my use cases. I'd love to say I've got time to commit to helping move the library along as I realise it's brand new, but realistically I can't.

There is also the lack of polyfill for threads for < Node.js 12 that makes this a no-go for us until April next year, but perhaps in that time the library will take off, mature and become the perfect fit 😄

In the meantime, let me just share that the env I'm working in is one of multiple services (inside a single app) each of which would manage their own jobs. So in this architecture having a single location for job files doesn't fit (I realise that is configurable) and similarly, I want each service to be able to "register" it's jobs as and when it sees fit.

Maybe that's some useful food for thought? Maybe it's just more evidence that this library isn't for me and that's totally OK too.

Happy to discuss further, but also happy to close this for now & not clog up your nice tidy repo!

niftylettuce · 2020-08-04T14:42:03Z

Would these three changes suffice for you @ErisDS to be able to use this?

Make a new addJob method that re-uses the exact same validation logic for adding a job as when one initializes a new Bree instance. You would use this instead of the Passing data #19 hack.
Polyfill for threads could simply be to use child_process's spawn method to spawn a child (albeit you would lose worker data communication). We could document how to do graceful reloads and listen for SIGINT etc (as we do with @ladjs/graceful)
An example in the README for using sockets, or redis pubsub to communicate with Bree to add new jobs (?)

niftylettuce · 2020-08-04T14:46:09Z

Actually, regarding (2) in my previous message, perhaps we could use https://github.com/chjj/bthreads as a polyfill, unless you knew of a better one?

shadowgate15 · 2020-08-04T16:12:20Z

@niftylettuce I do wonder if adding an addJob functionality would be a good feature. thoughts?

niftylettuce · 2020-08-04T16:41:44Z

@shadowgate15 I think we should 100% add it, since it is a common use case and the hack is not best approach. There could also be a .remove method too (we should probably call it .add to keep the API simple and similar to run.

niftylettuce · 2020-08-05T03:22:57Z

v1.1.24 released and now has support for add and remove methods (examples have been added to the README too), thanks @shadowgate15 for all your hard work here.

Next I think we just need to polyfill worker threads, and add some IPC/pubsub/socket examples.

dilizarov · 2020-08-05T04:20:50Z

@niftylettuce thanks for the speedy updates. My question:

Let's say I need to dynamically run a job whenever I get a webhook from a 3rd party service with specific data.

Is it fine to call bree.add(jobInWebhookWithCustomDataInWorkerData) every time the webhook happens?

This job would only run once. Or is it possible to rerun a job with new worker data? I fear bree.add makes me feel like if the webhook gets triggered 10 times, then I'll have created 10 bree jobs that only run once. Would I have to clean each of these jobs myself?

My idea would be to have a singleton Bree instance that houses all of my jobs and even runs my dynamic one-off jobs.

The scheduled jobs recurring jobs without custom parameters per recurrence are wonderfully documented in this library.

If we had an example of the best way to run dynamic one-off jobs, that'd be fantastic. Right now, I imagine the answer is to run the job via bree.add, but then I need to make sure that the job runs ASAP and gets removed once complete.

niftylettuce · 2020-08-05T04:44:35Z

If you just give it a unique name, it should be fine, and also you will need to add bree.run(someJobObjectWithUniqueName). Make sure to pass a path each time (this can be static/the same, it's just the name that needs to be unique, and could be an ID or something, whatever you choose).

niftylettuce · 2020-08-05T04:47:37Z

Either myself or @shadowgate15 will add a section to README for Dynamic jobs.

dilizarov · 2020-08-05T04:49:28Z

Ah, so for every dynamic event, I'll want it unique?

Something like the following in my webhook:

const uniqueJobName = `jobName-${uuid.v4()}`
bree.add(jobConfigWithUniqueJobName)
bree.run(uniqueJobName)

And I shouldn't run into any problems with having a ton of dynamic jobs spun up and in the Bree instance without them being cleared after they are run?

I look forward to the section in the README!

Super excited for this library :)

niftylettuce · 2020-08-05T18:04:22Z

v1.1.25 is now released with support for adding single jobs (instead of having to add an Array when you call bree.add).

https://github.com/breejs/bree/releases/tag/v1.1.25

niftylettuce · 2020-08-05T18:07:01Z

Also I just wanted to share, it is generally against best practice to use IPC/websockets/pubsub to queue dynamic jobs in real-time. Your process could be interrupted at any time, even to reasons out of your control, and the data would be lost. I would recommend keeping dynamic jobs actually stored in a queue, with a persistent database, and then have a job that runs every so often to flush the queue (with limited concurrency). We will still find time to document these examples and also answer your questions soon, along with adding polyfill for workers for you @ErisDS.

dilizarov · 2020-08-14T05:56:55Z

@niftylettuce Are you suggesting for dynamic jobs we want run immediately, we should place it in a persistent DB and simply have Bree poll for the job like... every few milliseconds or something? I understand using the persistent DB for dynamic jobs that are allowed to be sent down the line, but for sending notifications to users for instance, I want them notified ASAP, but I want to leverage jobs to get it done.

niftylettuce · 2020-08-14T05:58:33Z

@dilizarov Yes I am suggesting that. You could just have an interval that polls and locks from the queue every second.

dilizarov · 2020-08-14T06:05:22Z

@niftylettuce Got it. I'm assuming another good thing about using a persistent DB is it helps architecturally as far as building a history of notifications goes (for UX purposes), but also I don't have to pass any data to the worker for Bree as ideally the data I persist into the DB should suffice.

I will say though... the idea of polling a Postgres DB every second gets me a little antsy, BUT I'm also kinda new to this stuff so I guess maybe that's the norm and DBs were made for these reasons anyways 🤷‍♂️

niftylettuce · 2020-08-14T06:09:16Z

I mean you could poll every 2 seconds, people really won't know the difference, and chances are there are going to be other delays that are out of your control. Just keep it simple, don't stress yourself out, and do things that don't scale.

dilizarov · 2020-08-14T06:24:21Z

@niftylettuce Just for anyone else who ends up on this thread, one could also leverage hooks in their ORM. For instance, with Sequelize ... when you create your model, you can leverage the afterCreate hook to kick off a job that'll process that newly persisted record. Obviously, if things go to shit, I imagine you can get retries going with that job.

niftylettuce · 2020-08-18T08:35:58Z

v3.0.0 of Bree is released with support for Node v10+ and browsers.

See the updated README at https://github.com/breejs/bree#readme.

https://github.com/breejs/bree/releases/tag/v3.0.0

niftylettuce · 2020-08-18T08:36:19Z

Also @ErisDS note that we have a bree.add and bree.remove method. The bree.add method accepts a string or object as you requested.

ghost · 2020-08-31T06:21:42Z

Is there an example for the discussion on dynamic jobs. I want to have a polling job that checks the db for some data, if that data meets a condition, then I want to trigger another job that can be long-running to process that data. The polling job would need to know that the long running job is still processing the data, once that job exits, the polling job continue to poll the db for new data and again trigger that processing job if it meets that condition

niftylettuce · 2020-08-31T08:25:58Z

just make it so you set values in your database when the persistent job needs to queue another job, and then have another job looking for that, other job.. I think you're overcomplicating it though. 99% of jobs I've seen don't require such complexity, unless you're building rockets.

https://github.com/breejs/express-example

ErisDS · 2020-08-31T08:32:58Z

TBH I have the same questions & I'm definitely not building rockets.

I'm still struggling to understand how to use Bree to fit my usecase. It feels like I have to make my use cases fit Bree. More examples would be fantastic.

Here's a hopefully clear, not rockets use case: A user upload a huge file to an API, and it needs background processing.

niftylettuce · 2020-08-31T08:35:38Z

@ErisDS the file needs background processed after the upload completes? Is it written to disk somewhere? Stored on S3 bucket? Tmp dir? Can you just write its location to a database "BackgroundQueue" and then once it's complete, remove it? Write a job that polls this db once a minute, or every second, depending on how frequent these are uploaded and how fast they need processed, and lock the specific files. If it takes X seconds long, or if it fails, you can implement your own retry logic. Did you need to update the front-end too once these background processes are finished? Socket.io? A simple XHR polling client-side to check against an API endpoint once the job is finished?

niftylettuce · 2020-08-31T08:38:30Z

To be clear: For dynamic jobs, all you really need (at least to me, from what you've all shared), is to create a persistent database table with Mongo or SQL (your choice), store some info about what is dynamic about the job, and then have a a job polling against this database and locking these jobs (only query for jobs that are not yet locked of course). You could also implement logic to not fetch a job if there's a count > 0 of a job already locked. You can have full fine-grain control with this approach. Jobs run faster this way, less overhead with some broken queue mechanisms like you'd find in Bull or Agenda, and way less complexity.

niftylettuce · 2020-08-31T08:41:37Z

If either of you gives me a very specific example, or provides more detail (e.g. you're writing to tmp dir and then you need to do XYZ with the job, e.g. compress the SVG or whatever) let me know. I'd be happy to help write your job for you so you have a clearer understanding. I'd also need to know if you're using SQL, Mongoose, Postgres, whatever - e.g. Bookshelf, Knex, etc.

niftylettuce · 2020-08-31T08:46:25Z

Do the job examples here help at all? https://github.com/forwardemail/forwardemail.net/tree/master/jobs -- specifically this one shows how to do concurrency @ https://github.com/forwardemail/forwardemail.net/blob/master/jobs/check-domains.js

ErisDS · 2020-08-31T09:02:35Z

My use case is a data importer. A file is uploaded and stored to disk & processed later to import into the DB (but the job is written, it's calling it that's the issue). The importer will be used 0-10 times at the very start of the applications lifespan and then likely never again. So it doesn't make any sense to me to have code that polls for the rest of the application's life - which is hopefully years / til the end of the internet - for something that will almost certainly never happen.

I'm looking for true one-off jobs.

ErisDS · 2020-08-31T09:11:49Z

A second use case is having job 1 that handles sending huge bulk emails in batches. And then job 2 that polls for resulting delivery events and processes those. There's no point running job 1 unless the application is configured to send emails, which it may not be, and there's no point running job 2 until an email has been sent.

All of this comes from being a decentralised app with 100s of 1000s of installs - not a single centralsied application.

I'm also interested in strategies for handling batch jobs, where there may be 100 jobs and then something extra has to happen when they all complete. In some job libraries they have specific handling for this e.g. sidekiq where the last child triggers an event/callback, and in other systems there's a parent job that monitors the batch jobs.

I'll check out those examples in more detail shortly.

NicolasGorga · 2022-02-08T00:08:07Z

Hello, i've just started using Bree and integrated it with a room booking request app for events that i'm developing for my university final project. I have two use cases, one that i didn't find any trouble to solve with Bree, but the other i cannot see how i can solve it with it.

Use Case 1: Send push notifications to users of my app every day, on a fixed hour of the day. I could solve it with no problem with Bree: when i initialize Bree i create a Job that is configured to run on a specific hour of the day, based on a variable in my .env

Use Case 2: Users make room booking requests for events they are going to have. This requests can be accepted or declined after a priorization algorithm is run, 5 minutes before the start of the event and a push notification is sent to the device informing the resolution. My idea was to add a job when the user creates the booking request and configure it to run in the future, 5 minutes before the start of the event, that would execute the algorithm and inform the user.

Example: if the event that a user creates a booking request for is starting in 1 hour, then i would configure the job to run at 55 minutes from the current time. The problem is that after reading the docs, it is stated that after using bree.add({name: myJob, date: eventStart - 5 minutes}), i have to use bree.run(), but i dont want to run it when the user makes the request, as stated previously, but exactly at the moment i configured the date property.

Is there any workaround? The project is due in a few days and this is the main problem im struggling now, please help i appreciate it!

shadowgate15 · 2022-02-08T00:12:16Z

'bree.run()' only starts the job. Putting 'date' in the config will set the job to run at that date.

NicolasGorga · 2022-02-08T00:32:26Z

Hey @shadowgate15 thanks for your quick response!

When adding the job, im setting the 'date', bu t the thing is it doesnt execute it. Ive just tested it, by executing this code when a specific endpoint is hit for testing purposes:

bree.add({
name: 'testAdd',
date: new Date(2022, 2, 7, 21, 30, 0)
});

I hit the endpoint at 21:28 PM and waited two minutes as im writing this but nothing happened, am i doing something wrong or this isnt supported? it would be a shame if it werent :(

shadowgate15 · 2022-02-08T00:39:16Z

date is interpreted by default as local time. So if your time and server time are not the same that could be why it didn't fire. Maybe try that?

shadowgate15 · 2022-02-08T00:45:04Z

Also you should use bree.start(JOB_NAME) instead of bree.run that will actually schedule it to run.

Also if the date is in the past it will not run.

NicolasGorga · 2022-02-08T01:07:39Z

@shadowgate15 you were right about the time, i already fixed that and consoled log to confirm it is now right!

The date is in the future, a couple of minutes, but it still isn't executing for some reason. Neither when i do it with add and then start and neither when i dont use add and pass it directly on initialization of Bree, have you had problems with the date property?

shadowgate15 · 2022-02-08T02:47:34Z

Try running it with NODE_ENV=* and see what the debug log says.

NicolasGorga · 2022-02-08T03:03:33Z

Hey @shadowgate15 I ended up changing it to cron and it works!

Im now having a different issue, where the worker seems to be exiting right away after doing a console log, will keep making some tests and tomorrow ill hit you up if i cant solve it by myself.

I really appreciate you help, since my work is due for next monday!

NicolasGorga · 2022-02-08T23:21:05Z

Hey @shadowgate15 how are you?

Sorry to bother you, but i couldn't resolve the second issue i was having, where i told you that the worker didn't seem to be waiting for the resolution of an async operation i await.

In the worker i make a call to a Service i created that calls a DAO which returns a list of push tokens. This operations are asynchronous so i await them in my worker. This works perfectly when i don't use the worker to esecute it, but with the worker it enters the function, bot skips the code after and exits.

To simplify things for the question i changed the worker to look like this:

(async () => {
console.log("worker started");

const wait = (ms: number) => new Promise(res => setTimeout(res, ms))

console.log('before async task');
await wait(5000);
console.log('5 seconds after');

process.exit(0);
}

)()

What isn't executing is the last console log, reflecting what happens in my real worker, where all the code after the await keyword gets skipped. I've searched lots of things on stackoverflow and blogs, but found nothing that lets me understand this situation.

Please could you give me some clarity as to what i'm doing wrong?

shadowgate15 · 2022-02-09T00:06:17Z

That should be an async function

NicolasGorga · 2022-02-09T00:42:46Z

@shadowgate15 i tried this in the dummy function i sent you and it worked, although i dont understand the difference from what i wrote, since im using the await keyword for thw asyn operation and all the other operations are synchronous.

Now applying this approach to my real job doesnt work, it fails just as before. My real job:

(async () => {
console.log("In worker");

const pushTokens = await userService.getPushTokens();

//nothing executes after the previous await
console.log("push tokens", pushTokens);

if (pushTokens.length > 0) {
  console.log("need to send push");

  const messages: ExpoPushMessage[] = pushTokens.map((pt) => ({
	to: pt,
	title: "December Rooms",
	body: "Vas a ir a la oficina hoy?",
	data: {
	  screenName: "ConfirmAssist",
	},
  }));

  console.log("sending push", messages);

  sendNotifications(messages);

  console.log("al push sent");
} else {
  console.log("no push to send");
}

process.exit(0);
})();

As you said with the dummy, i tried wrapping everything apart from the process.exit(0) in an async function, awaiting and the doing process.exit() but it didn't work, this is confusing

ErisDS changed the title ~~Rare / one off jobs~~ Rare one off jobs & dynamic scheduled jobs Jul 29, 2020

shadowgate15 mentioned this issue Aug 5, 2020

Added add() and remove() #21

Merged

niftylettuce closed this as completed Aug 18, 2020

naz mentioned this issue Nov 18, 2020

[docs] Example integration with external persistance layer #48

Open

ScreamZ mentioned this issue Feb 19, 2021

Infinite job #92

Closed

Rare one off jobs & dynamic scheduled jobs #17

Rare one off jobs & dynamic scheduled jobs #17

Comments

ErisDS commented Jul 29, 2020 • edited Loading

niftylettuce commented Aug 1, 2020 • edited Loading

niftylettuce commented Aug 1, 2020

ErisDS commented Aug 4, 2020 • edited Loading

niftylettuce commented Aug 4, 2020 • edited Loading

niftylettuce commented Aug 4, 2020

shadowgate15 commented Aug 4, 2020 • edited Loading

niftylettuce commented Aug 4, 2020

niftylettuce commented Aug 5, 2020

dilizarov commented Aug 5, 2020

niftylettuce commented Aug 5, 2020

niftylettuce commented Aug 5, 2020

dilizarov commented Aug 5, 2020 • edited Loading

niftylettuce commented Aug 5, 2020

niftylettuce commented Aug 5, 2020

dilizarov commented Aug 14, 2020

niftylettuce commented Aug 14, 2020

dilizarov commented Aug 14, 2020

niftylettuce commented Aug 14, 2020

dilizarov commented Aug 14, 2020

niftylettuce commented Aug 18, 2020

niftylettuce commented Aug 18, 2020

ghost commented Aug 31, 2020

niftylettuce commented Aug 31, 2020

ErisDS commented Aug 31, 2020

niftylettuce commented Aug 31, 2020

niftylettuce commented Aug 31, 2020

niftylettuce commented Aug 31, 2020

niftylettuce commented Aug 31, 2020

ErisDS commented Aug 31, 2020

ErisDS commented Aug 31, 2020

NicolasGorga commented Feb 8, 2022

shadowgate15 commented Feb 8, 2022

NicolasGorga commented Feb 8, 2022

shadowgate15 commented Feb 8, 2022

shadowgate15 commented Feb 8, 2022

NicolasGorga commented Feb 8, 2022

shadowgate15 commented Feb 8, 2022

NicolasGorga commented Feb 8, 2022

NicolasGorga commented Feb 8, 2022

shadowgate15 commented Feb 9, 2022

NicolasGorga commented Feb 9, 2022

ErisDS commented Jul 29, 2020 •

edited

Loading

niftylettuce commented Aug 1, 2020 •

edited

Loading

ErisDS commented Aug 4, 2020 •

edited

Loading

niftylettuce commented Aug 4, 2020 •

edited

Loading

shadowgate15 commented Aug 4, 2020 •

edited

Loading

dilizarov commented Aug 5, 2020 •

edited

Loading