Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/providers/aws: support gzip user data #1100

Closed
wants to merge 1 commit into from

Conversation

paulnivin
Copy link

Support gzip-encoded AWS EC2 instance user data. Enables large
self-contained Ignition configs to be versioned without external
dependencies. User data is limited to 16384 bytes, prior to being
stored as base64.

Fixes #1096

Signed-off-by: Paul Fisher [email protected]

Support gzip-encoded AWS EC2 instance user data.  Enables large
self-contained Ignition configs to be versioned without external
dependencies. User data is limited to 16384 bytes, prior to being
stored as base64.

Signed-off-by: Paul Fisher <[email protected]>
Copy link
Contributor

@arithx arithx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decompression & detection looks fine to me but I wonder if we want this to be limited to AWS only. Even if other platforms don't have the same size limitation I think there's value in having consistency.

cc @bgilbert @jlebon @cgwalters @dustymabe

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Paul, thanks so much for the patch! This indeed seems like quite an obvious thing we should have done from the start - obvious enough I wonder if there's really a reason we didn't do it. A notable argument in favor here is that cloud-init does this too.

Agree with Luca this is probably something we should do across the board too for consistency.

@lucab
Copy link
Contributor

lucab commented Sep 24, 2020

On some platforms we already do this in a limited form, e.g. on vmware one can specify a "gzip+base64" encoding. I'm not sure if this is due to an explicit-vs-autodetection choice, or just an historical artifacts.

@paulnivin
Copy link
Author

This indeed seems like quite an obvious thing we should have done from the start - obvious enough I wonder if there's really a reason we didn't do it.

We're in the process of moving to Fedora CoreOS for our EC2 fleet. The ability to version Ignition configs with our EC2 instances as part of updating Launch Templates with terraform is straightforward and means that we don't have to build out a framework to manage Ignition configs in S3. We're using https://github.com/poseidon/terraform-provider-ct to manage FCC snippets and terraform natively supports gzip so this is an obvious change to get past the 16K limit (which we've hit) and provides substantial runway (JSON compresses well). We're currently using this change in production.

The patch is meant to be a minimally invasive change that's compatible with prior Ignition deployments, unblocks our deployment (AWS user data size limits are particularly restrictive) and doesn't try to introduce an explicit framework. Moving to an explicit framework (for example, adding MIME Content Types) is a more invasive change and I assume would take substantial effort to land on the correct design, especially if it's meant to work across all providers, and introduces complexities on the user data generation side as well.

If you'd like for me to refactor this PR to be lower in the stack to autodetect gzip configs, let me know. If there's more debate or discussion that's needed, my preference would be that we land AWS support, and then follow on PRs could unify the platform providers into a single compression implementation.

@cgwalters
Copy link
Member

We're currently using this change in production.

That's useful to know - our CI here isn't covering this unfortunately.

(Somewhat tangential aside: So did you do a custom FCOS build using e.g. https://github.com/coreos/coreos-assembler/ for that? Or did you just manually hack an updated Ignition into the initramfs? If you have any "experience report" around that it'd be great to post somewhere - what stumbling blocks you hit with the tooling etc.)

@bgilbert
Copy link
Contributor

If we want to support compression, I agree we should do it everywhere. However, I asked @crawford if there was a reason Ignition doesn't already do so, and he said:

Yes! This was a major point of discussion early on. I am very much against compression unless the metadata host can properly identify it as being compressed. I don't like blindly trying to reinterpret data until we get a match. We have support for append/replace/merge, so use that instead.

So yes, the reason we already support gzip on VMware is that there's a second guestinfo property explicitly declaring the encoding.

I'll also note that compression support solves a somewhat limited problem. gzip compression only gains you a factor of 3 or so (less for SSH keys and precompressed resources) so arbitrarily large configs on AWS will still need to be stored externally and passed by reference. Given that we'd be replacing one small limit with a slightly larger and less predictable one, it's not obvious to me that this is worthwhile.

@paulnivin
Copy link
Author

(Somewhat tangential aside: So did you do a custom FCOS build using e.g. https://github.com/coreos/coreos-assembler/ for that? Or did you just manually hack an updated Ignition into the initramfs? If you have any "experience report" around that it'd be great to post somewhere - what stumbling blocks you hit with the tooling etc.)

Custom FCOS build with "cosa build, cp in ignition binaries, cosa buildinitramfs-fast, flatten qcow2 layers, cosa buildextend-aws, convert vmdk to raw image and dd to EBS volume" that's then consumed by an existing amazon-ebssurrogate packer pipeline to snapshot the volume and generate an AMI. Planning to use the same pipeline to build ARM64 images.

@paulnivin
Copy link
Author

I'll also note that compression support solves a somewhat limited problem. gzip compression only gains you a factor of 3 or so (less for SSH keys and precompressed resources) so arbitrarily large configs on AWS will still need to be stored externally and passed by reference. Given that we'd be replacing one small limit with a slightly larger and less predictable one, it's not obvious to me that this is worthwhile.

For a real world use case, our current Ignition config is 16K (just over the threshold) and gzipped it's 7K. The stack stands up a Kubernetes backplane and only needs EC2 and ECR (where we have multi-region fail over) to be operational. I don't anticipate that our Ignition configs after compression will exceed 16K, so this functionality is extremely worthwhile in our environment. EC2 user data is guaranteed to exist.

If it's a hard requirement that we depend on external services to provide our final Ignition config, multi-region failover of S3 within Ignition is likely the next best option. S3 was down for multiple hours in 2017 in us-east-1. In our AWS environment at the time, we were able to continue operating in us-east-1 even with S3 hard down. We try to aggressively minimize the number of external dependencies in our provisioning stack.

EC2 instance user data is an opaque blob -- if an encoding is declared that's because there's an agreement between the writer and reader of user data. Cloud-init supports direct gzip along with an explicit declaration of content types via MIME. If gzip support matching cloud-init's functionality is not an option, would you be open to user data starting with a MIME header to signal compression?

@bgilbert
Copy link
Contributor

Custom FCOS build with "cosa build, cp in ignition binaries, cosa buildinitramfs-fast, flatten qcow2 layers, cosa buildextend-aws

You can make install DESTDIR=/path/to/cosa-workdir/overrides/rootfs from the Ignition source tree, and then skip everything except cosa build && cosa buildextend-aws.

For a real world use case, our current Ignition config is 16K (just over the threshold) and gzipped it's 7K. The stack stands up a Kubernetes backplane and only needs EC2 and ECR (where we have multi-region fail over) to be operational. I don't anticipate that our Ignition configs after compression will exceed 16K, so this functionality is extremely worthwhile in our environment. EC2 user data is guaranteed to exist.

I wouldn't recommend designing your infrastructure around the assumption that you'll never need more than 9 KiB compressed of additional customization. If you want to avoid S3, you could run an additional EC2 instance that serves your Ignition configs.

EC2 instance user data is an opaque blob -- if an encoding is declared that's because there's an agreement between the writer and reader of user data. Cloud-init supports direct gzip along with an explicit declaration of content types via MIME. If gzip support matching cloud-init's functionality is not an option, would you be open to user data starting with a MIME header to signal compression?

I agree that we can define the encoding however we want. We've generally tried to design Ignition to be explicit rather than implicit, and I don't think a MIME multipart envelope actually helps in that regard, since we'd still have to autodetect the existence of the MIME header and then deal with a variety of edge cases inside the MIME envelope.

Personally, though, detection of compression magic numbers doesn't bother me that much. (coreos-installer does exactly this for install images.) Ignition 0.x used to autodetect cloud-configs and user scripts, so there's even some precedent. My main concern with compression support is that it's an attractive nuisance: it further encourages storing production configs directly in AWS userdata, where the 16 KiB limit can still be reached at an inconvenient moment.

@cgwalters
Copy link
Member

If you want to avoid S3, you could run an additional EC2 instance that serves your Ignition configs.

Right, worth linking here to https://github.com/openshift/machine-config-operator/ which includes an "Ignition server" (machine-config-server) as part of general kubernetes-native OS management.

Hmm that said we should probably also have nicer support for multiple URLs in included Ignition, so that one could upload configs to multiple S3 regions and have fallback; something like:

'{"ignition":{"config":{"replace":{"source0":"https://s3-us-east-1/mybucket/myconfig.ign", "source1": "https://s3-us-west-1/mybucket/myconfig.ign", "verification": "sha512here"}},"version":"3.1.0"}}'

(Multiple urls should probably require Verification to ensure they are the same and avoid nondeterminism)

@paulnivin
Copy link
Author

Personally, though, detection of compression magic numbers doesn't bother me that much. (coreos-installer does exactly this for install images.) Ignition 0.x used to autodetect cloud-configs and user scripts, so there's even some precedent. My main concern with compression support is that it's an attractive nuisance: it further encourages storing production configs directly in AWS userdata, where the 16 KiB limit can still be reached at an inconvenient moment.

Agreed on these points; however, compressed user data is still our current preferred path as part of rolling out CoreOS. The production Lyft stack aggressively autoscales in AWS based on CPU load and any additional dependencies (eg. EC2 instances serving ignition configs) means that those dependencies become a tier-0/critical service that if it breaks or has issues, prevents the Lyft stack from being able to scale up. We've done a lot of work over the years to minimize boot times and decrease the likelihood of cloud-init failures.

Regarding @cgwalters point:

Hmm that said we should probably also have nicer support for multiple URLs in included Ignition, so that one could upload configs to multiple S3 regions and have fallback

If Ignition configs on AWS become large enough that they are no longer manageable in user data, I think this is a good solution. Multi-region failover of Ignition configs served from S3 seems ideal, and matches our current failure handing of ECR. The Ignition implementation of course will need to have good timeouts/failure handling for trying additional URLs.

@bgilbert
Copy link
Contributor

Multi-region failover of Ignition configs served from S3 seems ideal, and matches our current failure handing of ECR. The Ignition implementation of course will need to have good timeouts/failure handling for trying additional URLs.

I think that's worth discussing. Filed as #1104.

@cgwalters
Copy link
Member

My main concern with compression support is that it's an attractive nuisance: it further encourages storing production configs directly in AWS userdata, where the 16 KiB limit can still be reached at an inconvenient moment.

I see this point but IMO a notable argument in favor of compression support is that it reduces the "punishment factor" for redundancy in Ignition configs. If one e.g. one ends up generating a whole lot of systemd units, it can start to add up fast and even a compression ratio of 3 is a huge benefit from as far as avoiding the limit.

(Offhand I bet zstd with a pre-trained dictionary on some sample Ignition configs could get notably better)

@bgilbert
Copy link
Contributor

bgilbert commented Oct 1, 2020

IMO a notable argument in favor of compression support is that it reduces the "punishment factor" for redundancy in Ignition configs.

Sure, but I'm not sure that's actually solving a problem. When userdata is limited, compression can be counterproductive in the way I mentioned, and when userdata is unlimited, configs usually aren't large enough in absolute terms for the savings to be substantial.

@crawford
Copy link
Contributor

crawford commented Oct 1, 2020

I think a much more interesting direction would be to add a compression field as a sibling to source and verification. Combining that with data URLs would allow for an outer config to contain an inner, compressed config which would provide some amount of space savings and yet still remain explicit and usable in other use cases. To double check that compression plus base64 encoding might actually yield a smaller config (vs the base64 encoding completely cancelling out the effects of compression), I tried it with a generated OpenShift bootstrap config:

$ wc --bytes < bootstrap.ign
313370

$ gzip --stdout < bootstrap.ign | base64 --wrap=0 | wc --bytes
180456

I haven't seen anything in here that has swayed me from my original standpoint. @paulnivin thank you for the pull request, but this particular change doesn't align with our long-term trajectory.

@crawford crawford closed this Oct 1, 2020
@bgilbert
Copy link
Contributor

bgilbert commented Oct 1, 2020

I think a much more interesting direction would be to add a compression field as a sibling to source and verification. Combining that with data URLs would allow for an outer config to contain an inner, compressed config which would provide some amount of space savings and yet still remain explicit and usable in other use cases.

We already have such a compression field, and as of Ignition 2.3.0 it can be used with data URLs. So this should work today.

@paulnivin
Copy link
Author

I haven't seen anything in here that has swayed me from my original standpoint. @paulnivin thank you for the pull request, but this particular change doesn't align with our long-term trajectory.

Is #1104 a reasonable path forward for Lyft's reliability requirements that aligns with your long-term trajectory?

@paulnivin
Copy link
Author

We already have such a compression field, and as of Ignition 2.3.0 it can be used with data URLs. So this should work today.

This works fine; however, the outer uncompressed config is verbose and the inner compressed data URLs are base64. fcct will also automatically compress larger snippets but not smaller ones. I haven't looked into the logic there, but that doesn't seem particularly useful given the size of the outer config and base64 blocks.

@bgilbert
Copy link
Contributor

bgilbert commented Oct 2, 2020

fcct will also automatically compress larger snippets but not smaller ones. I haven't looked into the logic there

It picks the smallest of 1) compressed, with the "compression": "gzip" attribute, 2) uncompressed base64, and 3) URL-encoded.

@crawford
Copy link
Contributor

crawford commented Oct 2, 2020

We already have such a compression field, and as of Ignition 2.3.0 it can be used with data URLs. So this should work today.

Ha! I could have sworn that was the case, but I only looked at https://coreos.github.io/ignition/configuration-v3_0/ and didn't dig any deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support gzip for AWS EC2 instance user data
6 participants