-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rare one off jobs & dynamic scheduled jobs #17
Comments
Hi @ErisDS and thanks writing in! 👋 🎉 I really want to see these cases be handled by Bree too (I have them for myself on some projects), and I'm still debating on the best approach for this. One approach would be to add IPC, pub/sub, or have a Bree job that queries for these one off jobs on a particular interval if real-time is not required, and in turn you could have Bree inside of Bree basically (though you may just want to use the modules that Bree internally uses to some degree if you don't want the overhead of Bree itself - which isn't much though). One of the main reasons why I wrote Bree was because of the bad practices (other libraries) led developers towards. I think abstracting jobs as needed into barebones approach where you are in control of everything, and each job is completely independent and as lightweight as possible - is the best approach. For (2) and (3), you could achieve these with pub/sub inside a long running Bree task or some other approach that isn't real time (like you said, a query every so often, e.g. one minute). You could then spawn Bree inside of this to fire off these jobs as needed. I am totally open to your thoughts on what an API for this might look like with Bree, or if you do find an approach that works, if you'd like to share it and we can add it to the README (or an abstraction of it, e.g. if it is confidential). Very open to contributors, PR's, and help from the community! |
One other thing I didn't mention is that Bree exposes all of its config, workers, etc. For example, someone added a new job to a Bree config after it already started by adding to |
As per #19 using the config object would skip validation etc, and feels very much like a hack 😬 Where I am at is assessing various libraries looking for the right fit. I'm super excited by what this library promises and wanted to share my use cases. I'd love to say I've got time to commit to helping move the library along as I realise it's brand new, but realistically I can't. There is also the lack of polyfill for threads for < Node.js 12 that makes this a no-go for us until April next year, but perhaps in that time the library will take off, mature and become the perfect fit 😄 In the meantime, let me just share that the env I'm working in is one of multiple services (inside a single app) each of which would manage their own jobs. So in this architecture having a single location for job files doesn't fit (I realise that is configurable) and similarly, I want each service to be able to "register" it's jobs as and when it sees fit. Maybe that's some useful food for thought? Maybe it's just more evidence that this library isn't for me and that's totally OK too. Happy to discuss further, but also happy to close this for now & not clog up your nice tidy repo! |
Would these three changes suffice for you @ErisDS to be able to use this?
|
Actually, regarding (2) in my previous message, perhaps we could use https://github.com/chjj/bthreads as a polyfill, unless you knew of a better one? |
@niftylettuce I do wonder if adding an addJob functionality would be a good feature. thoughts? |
@shadowgate15 I think we should 100% add it, since it is a common use case and the hack is not best approach. There could also be a |
v1.1.24 released and now has support for Next I think we just need to polyfill worker threads, and add some IPC/pubsub/socket examples. |
@niftylettuce thanks for the speedy updates. My question: Let's say I need to dynamically run a job whenever I get a webhook from a 3rd party service with specific data. Is it fine to call This job would only run once. Or is it possible to rerun a job with new worker data? I fear My idea would be to have a singleton The scheduled jobs recurring jobs without custom parameters per recurrence are wonderfully documented in this library. If we had an example of the best way to run dynamic one-off jobs, that'd be fantastic. Right now, I imagine the answer is to run the job via |
If you just give it a unique name, it should be fine, and also you will need to add |
Either myself or @shadowgate15 will add a section to README for Dynamic jobs. |
Ah, so for every dynamic event, I'll want it unique? Something like the following in my webhook:
And I shouldn't run into any problems with having a ton of dynamic jobs spun up and in the Bree instance without them being cleared after they are run? I look forward to the section in the README! Super excited for this library :) |
v1.1.25 is now released with support for adding single jobs (instead of having to add an Array when you call |
Also I just wanted to share, it is generally against best practice to use IPC/websockets/pubsub to queue dynamic jobs in real-time. Your process could be interrupted at any time, even to reasons out of your control, and the data would be lost. I would recommend keeping dynamic jobs actually stored in a queue, with a persistent database, and then have a job that runs every so often to flush the queue (with limited concurrency). We will still find time to document these examples and also answer your questions soon, along with adding polyfill for workers for you @ErisDS. |
@niftylettuce Are you suggesting for dynamic jobs we want run immediately, we should place it in a persistent DB and simply have Bree poll for the job like... every few milliseconds or something? I understand using the persistent DB for dynamic jobs that are allowed to be sent down the line, but for sending notifications to users for instance, I want them notified ASAP, but I want to leverage jobs to get it done. |
@dilizarov Yes I am suggesting that. You could just have an interval that polls and locks from the queue every second. |
@niftylettuce Got it. I'm assuming another good thing about using a persistent DB is it helps architecturally as far as building a history of notifications goes (for UX purposes), but also I don't have to pass any data to the worker for Bree as ideally the data I persist into the DB should suffice. I will say though... the idea of polling a Postgres DB every second gets me a little antsy, BUT I'm also kinda new to this stuff so I guess maybe that's the norm and DBs were made for these reasons anyways 🤷♂️ |
I mean you could poll every 2 seconds, people really won't know the difference, and chances are there are going to be other delays that are out of your control. Just keep it simple, don't stress yourself out, and do things that don't scale. |
@niftylettuce Just for anyone else who ends up on this thread, one could also leverage hooks in their ORM. For instance, with Sequelize ... when you create your model, you can leverage the afterCreate hook to kick off a job that'll process that newly persisted record. Obviously, if things go to shit, I imagine you can get retries going with that job. |
v3.0.0 of Bree is released with support for Node v10+ and browsers. See the updated README at https://github.com/breejs/bree#readme. |
Also @ErisDS note that we have a |
Is there an example for the discussion on dynamic jobs. I want to have a polling job that checks the db for some data, if that data meets a condition, then I want to trigger another job that can be long-running to process that data. The polling job would need to know that the long running job is still processing the data, once that job exits, the polling job continue to poll the db for new data and again trigger that processing job if it meets that condition |
just make it so you set values in your database when the persistent job needs to queue another job, and then have another job looking for that, other job.. I think you're overcomplicating it though. 99% of jobs I've seen don't require such complexity, unless you're building rockets. |
TBH I have the same questions & I'm definitely not building rockets. I'm still struggling to understand how to use Bree to fit my usecase. It feels like I have to make my use cases fit Bree. More examples would be fantastic. Here's a hopefully clear, not rockets use case: A user upload a huge file to an API, and it needs background processing. |
@ErisDS the file needs background processed after the upload completes? Is it written to disk somewhere? Stored on S3 bucket? Tmp dir? Can you just write its location to a database "BackgroundQueue" and then once it's complete, remove it? Write a job that polls this db once a minute, or every second, depending on how frequent these are uploaded and how fast they need processed, and lock the specific files. If it takes X seconds long, or if it fails, you can implement your own retry logic. Did you need to update the front-end too once these background processes are finished? Socket.io? A simple XHR polling client-side to check against an API endpoint once the job is finished? |
To be clear: For dynamic jobs, all you really need (at least to me, from what you've all shared), is to create a persistent database table with Mongo or SQL (your choice), store some info about what is dynamic about the job, and then have a a job polling against this database and locking these jobs (only query for jobs that are not yet locked of course). You could also implement logic to not fetch a job if there's a count > 0 of a job already locked. You can have full fine-grain control with this approach. Jobs run faster this way, less overhead with some broken queue mechanisms like you'd find in Bull or Agenda, and way less complexity. |
If either of you gives me a very specific example, or provides more detail (e.g. you're writing to tmp dir and then you need to do XYZ with the job, e.g. compress the SVG or whatever) let me know. I'd be happy to help write your job for you so you have a clearer understanding. I'd also need to know if you're using SQL, Mongoose, Postgres, whatever - e.g. Bookshelf, Knex, etc. |
Do the job examples here help at all? https://github.com/forwardemail/forwardemail.net/tree/master/jobs -- specifically this one shows how to do concurrency @ https://github.com/forwardemail/forwardemail.net/blob/master/jobs/check-domains.js |
My use case is a data importer. A file is uploaded and stored to disk & processed later to import into the DB (but the job is written, it's calling it that's the issue). The importer will be used 0-10 times at the very start of the applications lifespan and then likely never again. So it doesn't make any sense to me to have code that polls for the rest of the application's life - which is hopefully years / til the end of the internet - for something that will almost certainly never happen. I'm looking for true one-off jobs. |
A second use case is having job 1 that handles sending huge bulk emails in batches. And then job 2 that polls for resulting delivery events and processes those. There's no point running job 1 unless the application is configured to send emails, which it may not be, and there's no point running job 2 until an email has been sent. All of this comes from being a decentralised app with 100s of 1000s of installs - not a single centralsied application. I'm also interested in strategies for handling batch jobs, where there may be 100 jobs and then something extra has to happen when they all complete. In some job libraries they have specific handling for this e.g. sidekiq where the last child triggers an event/callback, and in other systems there's a parent job that monitors the batch jobs. I'll check out those examples in more detail shortly. |
Hello, i've just started using Bree and integrated it with a room booking request app for events that i'm developing for my university final project. I have two use cases, one that i didn't find any trouble to solve with Bree, but the other i cannot see how i can solve it with it. Use Case 1: Send push notifications to users of my app every day, on a fixed hour of the day. I could solve it with no problem with Bree: when i initialize Bree i create a Job that is configured to run on a specific hour of the day, based on a variable in my .env Use Case 2: Users make room booking requests for events they are going to have. This requests can be accepted or declined after a priorization algorithm is run, 5 minutes before the start of the event and a push notification is sent to the device informing the resolution. My idea was to add a job when the user creates the booking request and configure it to run in the future, 5 minutes before the start of the event, that would execute the algorithm and inform the user. Example: if the event that a user creates a booking request for is starting in 1 hour, then i would configure the job to run at 55 minutes from the current time. The problem is that after reading the docs, it is stated that after using bree.add({name: myJob, date: eventStart - 5 minutes}), i have to use bree.run(), but i dont want to run it when the user makes the request, as stated previously, but exactly at the moment i configured the date property. Is there any workaround? The project is due in a few days and this is the main problem im struggling now, please help i appreciate it! |
'bree.run()' only starts the job. Putting 'date' in the config will set the job to run at that date. |
Hey @shadowgate15 thanks for your quick response! When adding the job, im setting the 'date', bu t the thing is it doesnt execute it. Ive just tested it, by executing this code when a specific endpoint is hit for testing purposes: bree.add({ I hit the endpoint at 21:28 PM and waited two minutes as im writing this but nothing happened, am i doing something wrong or this isnt supported? it would be a shame if it werent :( |
|
Also you should use Also if the date is in the past it will not run. |
@shadowgate15 you were right about the time, i already fixed that and consoled log to confirm it is now right! The date is in the future, a couple of minutes, but it still isn't executing for some reason. Neither when i do it with add and then start and neither when i dont use add and pass it directly on initialization of Bree, have you had problems with the date property? |
Try running it with NODE_ENV=* and see what the debug log says. |
Hey @shadowgate15 I ended up changing it to cron and it works! Im now having a different issue, where the worker seems to be exiting right away after doing a console log, will keep making some tests and tomorrow ill hit you up if i cant solve it by myself. I really appreciate you help, since my work is due for next monday! |
Hey @shadowgate15 how are you? Sorry to bother you, but i couldn't resolve the second issue i was having, where i told you that the worker didn't seem to be waiting for the resolution of an async operation i await. In the worker i make a call to a Service i created that calls a DAO which returns a list of push tokens. This operations are asynchronous so i await them in my worker. This works perfectly when i don't use the worker to esecute it, but with the worker it enters the function, bot skips the code after and exits. To simplify things for the question i changed the worker to look like this: (async () => {
)() What isn't executing is the last console log, reflecting what happens in my real worker, where all the code after the await keyword gets skipped. I've searched lots of things on stackoverflow and blogs, but found nothing that lets me understand this situation. Please could you give me some clarity as to what i'm doing wrong? |
That should be an |
@shadowgate15 i tried this in the dummy function i sent you and it worked, although i dont understand the difference from what i wrote, since im using the await keyword for thw asyn operation and all the other operations are synchronous. Now applying this approach to my real job doesnt work, it fails just as before. My real job: (async () => {
//nothing executes after the previous await
process.exit(0); As you said with the dummy, i tried wrapping everything apart from the process.exit(0) in an async function, awaiting and the doing process.exit() but it didn't work, this is confusing |
I was really excited to see this library pop up - it's awesome to see something using native worker threads and not requiring redis/mongo/some other store. But was then a bit confused by the configuration method when it came to trying it out.
I have use cases for different types of jobs:
Given the rareness of the one-off jobs, it's a shame to have to declare them upfront, rather than being able to add them if and when they show up - otherwise there's overhead for no good reason.
With type 3 specifically, I find it odd that Bree has support for setting an exact date when a job should run, but that can only be set on instantiation?
I guess I'm looking for
bree.add(({jobConfig})
orbree.run({jobConfig})
- or am I massively missing something?The text was updated successfully, but these errors were encountered: