-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Manager] New agent structure (symlinks) #20400
[Ingest Manager] New agent structure (symlinks) #20400
Conversation
💔 Tests FailedExpand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
Pinging @elastic/ingest-management (Team:Ingest Management) |
@ruflin @blakerouse lets make sure we review this quickly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this extensively and I think this works really good.
The only weird thing that I hit, is that running ./elastic-agent run
on Windows the first time will result in the re-exec which will cause PowerShell to return and elastic-agent
will keep running. This is really a side-effect of Windows and how re-exec works (which is what I wrote and un-related to this branch). I would like to see if we can improve on that in a follow-up branch.
Otherwise this is what we need to get further along with self-upgrade!
@blakerouse i was thinking that the idea about |
@michalpristas I agree, I think we can solve that issue with |
@ruflin can i get your eyes on it before mergin? |
@michalpristas Unfortunately didn't get to it yet, will have a look tomorrow but please don't block on me to get this in. |
@ruflin tomorrow is ok for me, fi you wont have time i will proceed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this on OS X with the .tar.gz and it works.
- We should make sure, we follow up with some tests that just execute things on different platforms (I think we already do)
- Then we should add tests that upgrade the binary between two commits to make sure upgrade works as expected.
What we do here is quite a bit of magic and code differs between platforms. To make this maintainable long term that others can also touch the code I think this needs quite a bit of additional documentation. Part of it in the code but also part of it outside to talk through how the upgrade exactly works and why. This can happen as a follow up.
@@ -476,6 +477,10 @@ func copyInstallScript(spec PackageSpec, script string, local *string) error { | |||
*local = strings.TrimSuffix(*local, ".tmpl") | |||
} | |||
|
|||
if strings.HasSuffix(*local, "."+spec.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have some docs in the code around the fancy logic that copyInstallScript does as function comment or in line. So if someone touches this in the future he knows why all the special cases are here.
{{- if .linux_capabilities }} | ||
setcap {{ .linux_capabilities }} {{ $beatBinary }} && \ | ||
{{- end }} | ||
{{- range $i, $modulesd := .ModulesDirs }} | ||
chmod 0770 {{ $beatHome}}/{{ $modulesd }} && \ | ||
{{- end }} | ||
chmod 0770 {{ $beatHome }}/data {{ $beatHome }}/logs | ||
chmod 0770 {{ $beatHome }}/data {{ $beatHome }}/data/elastic-agent-{{ commit_short }}/logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is an officially released version like 7.9.1
, will it still contain the commit hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is to support snapshots, we can change function to include version or do some conditional formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For someone only running production builds, it will probably be unexpected to see here a hash instead of a specific version. Would be nice if we have a nice switch between the two. But then we have the problem how to separate 2 release candidates 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we have a single way, having special case would just increase complexity IMHO.
Also, this is a good thing we could allow people to use preview build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how debugging is going to work. You upgrade from 7.9.1 to 7.10.0 and things go wrong. You now need to figure out which hash 7.9.1. and 7.10.0 was know which directories to look into.
Perhaps the solution here is more symlinks with the exact version number. If two 7.9.1 exists, it only links to the most recent one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin The scenario you are describing will only be confusing if you upgrade to nightly snapshots. In the normal production environment you should only have something like this:
7.9.0-abcd
7.9.1-abcd2
7.9.2-bcd
7.10.0-huhuh
Now, concerning the rollback in debugging while using multiples snapshots, I believe you are right it might be confusing.
@michalpristas How do you plan to solve the following issue, note this might be an edge case.
- I run 7.10-{hash1} (snapshot generated on monday)
- I try to upgrade to 7.10-{hash2} It fails.
- Rollback to 7.10-{hash1}
We could probable solve the situation with symlinks or traces on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ph I believe there is only ever 1 previous version in the data directory during upgrade. Once upgrade is successful, the new Agent will delete the other version directory.
This makes it easy to know which is the previous version, its just not the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, forgot about that. so its a non issue :)
func preRunCheck(flags *globalFlags) func(cmd *cobra.Command, args []string) error { | ||
return func(cmd *cobra.Command, args []string) error { | ||
if sn := paths.ServiceName(); sn != "" { | ||
if !filepath.IsAbs(os.Args[0]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth a code comment.
const ( | ||
defaultConfig = "elastic-agent.yml" | ||
hashLen = 6 | ||
commitFile = ".elastic-agent.active.commit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different from the build hash? Will it change during upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change is in location, build hash file we have is in different location than our configuration files and differs based on platform/type of package.
i introduced this hidden file so it sits right next to configuration and if needed can change format
Co-authored-by: Nicolas Ruflin <[email protected]>
…ats into agent-new-structure-sym
[Ingest Manager] New agent structure (symlinks) (elastic#20400)
@EricDavisX and @rahulgupta-qasource We have changed the structure of the Elastic Agent, it so it will a good idea to tests packages again with the new snapshots. This will also be backported to 7.10 branch. @michalpristas Can you sync with @dedemorton This could possibly have impact on the documentation, we should also explain the structure to users. |
…ne-2.0 * upstream/master: [Metricbeat][test] Disable ec2 flaky test (elastic#20959) Check if tracer is active before starting a transaction (elastic#20852) [Elastic Agent] Add support for variable replacement from providers (elastic#20839) Only request wildcard expansion for hidden indices if supported (elastic#20938) [Ingest Manager] New agent structure (symlinks) (elastic#20400) [Ingest Manager] Print a message confirming shutdown (elastic#20948) Skip flaky test on unix input (elastic#20942) [Ingest Manager] Align introspect-inspect naming in code (elastic#20952) [Filebeat][zeek] Map new x509 fields for ssl module (elastic#20927) [CI] fix regression with variable name (elastic#20930) [Autodiscover] Handle input-not-finished errors in config reload (elastic#20915) [Ingest Manager] Remove Success from fleet contract (elastic#20449)
…-faster * upstream/master: [Metricbeat][test] Disable ec2 flaky test (elastic#20959) Check if tracer is active before starting a transaction (elastic#20852) [Elastic Agent] Add support for variable replacement from providers (elastic#20839) Only request wildcard expansion for hidden indices if supported (elastic#20938) [Ingest Manager] New agent structure (symlinks) (elastic#20400) [Ingest Manager] Print a message confirming shutdown (elastic#20948) Skip flaky test on unix input (elastic#20942) [Ingest Manager] Align introspect-inspect naming in code (elastic#20952) [Filebeat][zeek] Map new x509 fields for ssl module (elastic#20927)
Hi @ph /@EricDavisX We have validated this ticket on latest Kibana 7.10.0-SNAPSHOT cloud environment. Enrolled agent with Windows(x86_64.zip), Linux(.deb, .rpm and .tar.gz) and macOS(.tar.gz) 7.10.0-SNAPSHOT packages on corresponding Hosts. Observations:
Queries:
Please let us know if we are missing something to validate this ticket or need to add any other validation scenario. |
I recommend we follow up with test work / questions in the new issue above, |
[Ingest Manager] New agent structure (symlinks) (elastic#20400)
What does this PR do?
Different approach to #20307
Working with symlinks turned out to be a bit tricky due to how OSes handles Working Directory and executable names.
Windows on top of that requires that service name which is used to be registered is in Abs form and it needs to match the one used to reguiter the service (hence the magic with os.Args[0] replacements in the code,
os.Args[0]
is used in lib as a service name)Due to the approach of determining WD when running a binary using a symlink
paths.yml
is either on the symlink level (windows
) or on executable level (darwin
andlinux
).These will get regenerated during future Upgrade/Rollback.
Why is it important?
For future upgrade work
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Same as with previous PR this is tested on linux/darwin and windws (service and direct run)