-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/providers/aws: support gzip user data #1100
Conversation
Support gzip-encoded AWS EC2 instance user data. Enables large self-contained Ignition configs to be versioned without external dependencies. User data is limited to 16384 bytes, prior to being stored as base64. Signed-off-by: Paul Fisher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decompression & detection looks fine to me but I wonder if we want this to be limited to AWS only. Even if other platforms don't have the same size limitation I think there's value in having consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Paul, thanks so much for the patch! This indeed seems like quite an obvious thing we should have done from the start - obvious enough I wonder if there's really a reason we didn't do it. A notable argument in favor here is that cloud-init does this too.
Agree with Luca this is probably something we should do across the board too for consistency.
On some platforms we already do this in a limited form, e.g. on vmware one can specify a "gzip+base64" encoding. I'm not sure if this is due to an explicit-vs-autodetection choice, or just an historical artifacts. |
We're in the process of moving to Fedora CoreOS for our EC2 fleet. The ability to version Ignition configs with our EC2 instances as part of updating Launch Templates with terraform is straightforward and means that we don't have to build out a framework to manage Ignition configs in S3. We're using https://github.com/poseidon/terraform-provider-ct to manage FCC snippets and terraform natively supports gzip so this is an obvious change to get past the 16K limit (which we've hit) and provides substantial runway (JSON compresses well). We're currently using this change in production. The patch is meant to be a minimally invasive change that's compatible with prior Ignition deployments, unblocks our deployment (AWS user data size limits are particularly restrictive) and doesn't try to introduce an explicit framework. Moving to an explicit framework (for example, adding MIME Content Types) is a more invasive change and I assume would take substantial effort to land on the correct design, especially if it's meant to work across all providers, and introduces complexities on the user data generation side as well. If you'd like for me to refactor this PR to be lower in the stack to autodetect gzip configs, let me know. If there's more debate or discussion that's needed, my preference would be that we land AWS support, and then follow on PRs could unify the platform providers into a single compression implementation. |
That's useful to know - our CI here isn't covering this unfortunately. (Somewhat tangential aside: So did you do a custom FCOS build using e.g. https://github.com/coreos/coreos-assembler/ for that? Or did you just manually hack an updated Ignition into the initramfs? If you have any "experience report" around that it'd be great to post somewhere - what stumbling blocks you hit with the tooling etc.) |
If we want to support compression, I agree we should do it everywhere. However, I asked @crawford if there was a reason Ignition doesn't already do so, and he said:
So yes, the reason we already support gzip on VMware is that there's a second guestinfo property explicitly declaring the encoding. I'll also note that compression support solves a somewhat limited problem. gzip compression only gains you a factor of 3 or so (less for SSH keys and precompressed resources) so arbitrarily large configs on AWS will still need to be stored externally and passed by reference. Given that we'd be replacing one small limit with a slightly larger and less predictable one, it's not obvious to me that this is worthwhile. |
Custom FCOS build with "cosa build, cp in ignition binaries, cosa buildinitramfs-fast, flatten qcow2 layers, cosa buildextend-aws, convert vmdk to raw image and dd to EBS volume" that's then consumed by an existing amazon-ebssurrogate packer pipeline to snapshot the volume and generate an AMI. Planning to use the same pipeline to build ARM64 images. |
For a real world use case, our current Ignition config is 16K (just over the threshold) and gzipped it's 7K. The stack stands up a Kubernetes backplane and only needs EC2 and ECR (where we have multi-region fail over) to be operational. I don't anticipate that our Ignition configs after compression will exceed 16K, so this functionality is extremely worthwhile in our environment. EC2 user data is guaranteed to exist. If it's a hard requirement that we depend on external services to provide our final Ignition config, multi-region failover of S3 within Ignition is likely the next best option. S3 was down for multiple hours in 2017 in us-east-1. In our AWS environment at the time, we were able to continue operating in us-east-1 even with S3 hard down. We try to aggressively minimize the number of external dependencies in our provisioning stack. EC2 instance user data is an opaque blob -- if an encoding is declared that's because there's an agreement between the writer and reader of user data. Cloud-init supports direct gzip along with an explicit declaration of content types via MIME. If gzip support matching cloud-init's functionality is not an option, would you be open to user data starting with a MIME header to signal compression? |
You can
I wouldn't recommend designing your infrastructure around the assumption that you'll never need more than 9 KiB compressed of additional customization. If you want to avoid S3, you could run an additional EC2 instance that serves your Ignition configs.
I agree that we can define the encoding however we want. We've generally tried to design Ignition to be explicit rather than implicit, and I don't think a MIME multipart envelope actually helps in that regard, since we'd still have to autodetect the existence of the MIME header and then deal with a variety of edge cases inside the MIME envelope. Personally, though, detection of compression magic numbers doesn't bother me that much. (coreos-installer does exactly this for install images.) Ignition 0.x used to autodetect cloud-configs and user scripts, so there's even some precedent. My main concern with compression support is that it's an attractive nuisance: it further encourages storing production configs directly in AWS userdata, where the 16 KiB limit can still be reached at an inconvenient moment. |
Right, worth linking here to https://github.com/openshift/machine-config-operator/ which includes an "Ignition server" (machine-config-server) as part of general kubernetes-native OS management. Hmm that said we should probably also have nicer support for multiple URLs in included Ignition, so that one could upload configs to multiple S3 regions and have fallback; something like:
(Multiple urls should probably require |
Agreed on these points; however, compressed user data is still our current preferred path as part of rolling out CoreOS. The production Lyft stack aggressively autoscales in AWS based on CPU load and any additional dependencies (eg. EC2 instances serving ignition configs) means that those dependencies become a tier-0/critical service that if it breaks or has issues, prevents the Lyft stack from being able to scale up. We've done a lot of work over the years to minimize boot times and decrease the likelihood of cloud-init failures. Regarding @cgwalters point:
If Ignition configs on AWS become large enough that they are no longer manageable in user data, I think this is a good solution. Multi-region failover of Ignition configs served from S3 seems ideal, and matches our current failure handing of ECR. The Ignition implementation of course will need to have good timeouts/failure handling for trying additional URLs. |
I think that's worth discussing. Filed as #1104. |
I see this point but IMO a notable argument in favor of compression support is that it reduces the "punishment factor" for redundancy in Ignition configs. If one e.g. one ends up generating a whole lot of systemd units, it can start to add up fast and even a compression ratio of 3 is a huge benefit from as far as avoiding the limit. (Offhand I bet zstd with a pre-trained dictionary on some sample Ignition configs could get notably better) |
Sure, but I'm not sure that's actually solving a problem. When userdata is limited, compression can be counterproductive in the way I mentioned, and when userdata is unlimited, configs usually aren't large enough in absolute terms for the savings to be substantial. |
I think a much more interesting direction would be to add a compression field as a sibling to source and verification. Combining that with data URLs would allow for an outer config to contain an inner, compressed config which would provide some amount of space savings and yet still remain explicit and usable in other use cases. To double check that compression plus base64 encoding might actually yield a smaller config (vs the base64 encoding completely cancelling out the effects of compression), I tried it with a generated OpenShift bootstrap config: $ wc --bytes < bootstrap.ign
313370
$ gzip --stdout < bootstrap.ign | base64 --wrap=0 | wc --bytes
180456 I haven't seen anything in here that has swayed me from my original standpoint. @paulnivin thank you for the pull request, but this particular change doesn't align with our long-term trajectory. |
We already have such a |
Is #1104 a reasonable path forward for Lyft's reliability requirements that aligns with your long-term trajectory? |
This works fine; however, the outer uncompressed config is verbose and the inner compressed data URLs are base64. fcct will also automatically compress larger snippets but not smaller ones. I haven't looked into the logic there, but that doesn't seem particularly useful given the size of the outer config and base64 blocks. |
It picks the smallest of 1) compressed, with the |
Ha! I could have sworn that was the case, but I only looked at https://coreos.github.io/ignition/configuration-v3_0/ and didn't dig any deeper. |
Support gzip-encoded AWS EC2 instance user data. Enables large
self-contained Ignition configs to be versioned without external
dependencies. User data is limited to 16384 bytes, prior to being
stored as base64.
Fixes #1096
Signed-off-by: Paul Fisher [email protected]