Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent respawn upon death #454

Open
MikeHawkesCapventis opened this issue Dec 12, 2024 · 3 comments
Open

Agent respawn upon death #454

MikeHawkesCapventis opened this issue Dec 12, 2024 · 3 comments
Labels
proposal Enhancement idea or proposal

Comments

@MikeHawkesCapventis
Copy link

Proposed change

If the agent dies for any reason, there's no way to respawn it - the nex node dies with it. The only recovery is to restart nex, and then redeploy (even though the scripts / binaries are already in the NEXCLIFILES object store bucket). If the agent some way of recovering from a critical failure, it would make nex a lot more resilient.

Use case

Most things would work nicely, however, I could write a script that causes the runtime engine to crash out (perhaps by consuming too much memory, or a network timeout issue when connecting remotely). If nex could trigger an agent restart and redeploy the things that were on the failing node, it would help.

Contribution

No response

@MikeHawkesCapventis MikeHawkesCapventis added the proposal Enhancement idea or proposal label Dec 12, 2024
@jordan-rash
Copy link
Contributor

In the new version of Nex, agents are treated the same as services. They have a restart count associated with them, I believe 3 by default. If your agent fails more than 3 times, then probably something that needs to be addressed in the agent code...

@MikeHawkesCapventis
Copy link
Author

Whoohoo! It's the one thing that's caught me out a few times - I added Goja and Anko to test out. While I wrapped with a recover, I couldn't stop the agent from dying if I did something stupid. This sounds like it's fixed - so should I close this?

@jordan-rash
Copy link
Contributor

Let's leave it open until you can play with it and let me know what you think. Admittedly, the V3 will probably merge today, but the agent sdk might be another week or so, so you might not be able to verify its behavior is what you are looking for a few more weeks. I am keen to hear your feedback, so lets leave this open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Enhancement idea or proposal
Projects
None yet
Development

No branches or pull requests

2 participants