-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Manager] Define Elastic Agent structure on disk for elastic agent support upgrade and rollbacks. #20048
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
@michalpristas I think we should still use the full version if possible in the folder v7.9.0-hash instead of v7. Is there a strong argument to only have v7, The reason I would think of is windows limit on characters? Where are we locating the registry of filebeat in the data folder? |
There is also the
I also agree with @ph whould use the full version number (v7.0.0). I question if we should really have the |
good point @blakerouse, I think the prev is only to keep state? Maybe we should that information in an existing file like fleet.yml or other. We could keep bit more information. IE when did that release was installed? |
updated issue description which what we talked with blake over zoom |
Looks good. 👍 |
LGTM |
@michalpristas Can you discuss with @ferullo with our plan, they also have persisted information and their own "installation" structure. |
asked Daniel about their internal state and they got it covered using |
Hmm at the moment we only always call: @ferullo Would it be bad for Agent to always call Agent does not know specifically if this really is an upgrade or not. In the case of an Agent just being removed (no un-enrolled, Endpoint will still be running). Then a newly enrolled agent on the same host runs the installer again, it will not know that it previously had Endpoint running. |
Yes, always passing We went back and forth on whether to (1) have a different command |
@ferullo I assume we can also use the same command to rollback to a previous version? |
Endpoint will rollback automatically if it is unable to upgrade for some reason. But if Agent needs to rollback and wants to downgrade Endpoint that should work. However I can't promise it will because its impossible to guarantee a previously released Endpoint works with future Endpoints, especially across major version updates. Is Agent going to upgrade itself first, then Beats and Endpoint? If so, this seems like a non issue. If not, to downgrade Endpoint I recommend using |
@ferullo Agent will upgrade itself first, then perform upgrades on the Beats and Endpoint. Below is a couple failure cases that we consider on upgrade:
So the flow for Endpoint breaks down in the worst case to (v1 and v2 just symbol version jumps):
Also possible if someone forces a downgrade (also possible):
|
Thanks for the details. I had not realized Agent would downgrade itself if Beats or Endpoint didn't work after the upgrade. That's slick. I think that flow works well. We have automated tests to make sure an Endpoint can be upgpraded, we'll add one to make sure it can be downgraded. I still think if Beats or Endpoint fail to downgrade then uninstall and re-install is the best course of action to make sure Agent/Beats/Endpoint all stay in sync for their version number. Though perhaps that would be handled by the normal |
I was thinking about snapshots over the weekend and this wont work for them, i realised that snapshot versions does not differentiate between snapshots (alwyas so version hash wont work for them. so i was thinking that we need something which is known at build time to create a package dir and do a differentiation in between version. what i was thinking about is a latest Package creationduring package creation we would prepare structure which is ready for unpack e.g so when this gets unpacked it already is differentiated. Same SNAPSHOT problemProblem might be when we receive action to update from one snapshot to another which is the same. Usually we should not upgrade from version to same version but for snapshot we have to. So for SNAPSHOT i was thinking about special handling (not for normal versions) where we unpack to @ph @blakerouse do you see some gotchas there? does it sound ok? |
@michalpristas Good catch on the issue with snapshots. I think it would be good to not use the whole commit hash, as that might cause issues on Windows due to file path length. Maybe just the first 8 or so? I was thinking similar without even adjusting the packaged bits, always extract the new upgrade agent into |
first few characters should work as well. |
goog catch on the snapshots. the |
moved logs under data |
@ruflin Could you take a look? |
I like the above proposal especially the part that we can also upgrade between snapshot builds. The part I didn't get is why not each version has its own log directory. I would expect the log collection pattern to be something like Does each log even that we currently ship contain the exact version of the Beat + Commit Hash if it is a snapshot? |
I am not against having a log per version.
|
Decision: We have logs per version, we can provide tooling or symlink to help the local debug experience. |
Closing this PR #20400 was merged. |
With upgrade in mind we need to align structure of where things go to make scenarios smooth and possible.
Proposed structure
v7.9.1-ab123d
is a v7.9.1 semver version where the rest of the string is a hash of a version which can contain suffixes like SNAPSHOT, BC...each version contains its own binary and dependent binaries in download/install directories
in this example
/v7.9.2-dba324e
is an older version which contains not only binary but snapshot of config files and actionstorelogs
were moved from version level to root level of the structure. this is so monitoring wont drop any events which might be unprocessed or generated in between upgrade steps.run
is used during runtime to store sockets etc.elastic-agent
at root level is a symlink to a currently active version, any service manager should point to this file as executable. this gets updated on update/rollbackelastic-agent.yml
,fleet.yml
andaction_store.yml
are config/state files which are used by active version of an agent. during upgrade process these are copied to version folder overwriting any previously generated config files if any.agent copies these files on start from its versioned directory if it contains any and removes them to avoid future overwrite.
older versioned folders are removed after grace period without beats in
FAILED
state together withprev
symlink (if used). after this point rollback wont be possible.cc @ph @blakerouse
The text was updated successfully, but these errors were encountered: