Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ec2MetadataClient IMDSv2 401 Unauthorized HTTP Status Code Exceptions #5764

Open
1 task
Petersoj opened this issue Dec 28, 2024 · 2 comments
Open
1 task
Assignees
Labels
bug This issue is a bug. needs-review This issue or PR needs review from the team. p2 This is a standard priority issue

Comments

@Petersoj
Copy link

Petersoj commented Dec 28, 2024

Describe the bug

Using Ec2MetadataClient to query the IMDSv2 service on an EC2 instance will eventually throw an SdkClientException for a 401 Unauthorized HTTP status code as shown here.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

This reproduction example should not print any errors. The Ec2MetadataClient should not use stale tokens, which could be the cause of the 401 Unauthorized response.

Current Behavior

Let it run long enough and the reproduction example produces the following log:

DEBUG 2024-12-28 18:40:33.451 UTC [main] software.amazon.awssdk.core.internal.http.loader.ClasspathSdkHttpServiceProvider - The HTTP implementation loaded is software.amazon.awssdk.http.apache.ApacheSdkHttpService@4b45a2f5
INFO  2024-12-28 18:40:33.802 UTC [main] App - Started.
DEBUG 2024-12-28 18:40:33.804 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Cached value is stale and will be refreshed.
DEBUG 2024-12-28 18:40:33.805 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Refreshing cached value.
DEBUG 2024-12-28 18:40:33.886 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Successfully refreshed cached value. Next Prefetch Time: +1000000000-12-31T23:59:59.999999999Z. Next Stale Time: 2024-12-28T18:41:33.886487462Z
INFO  2024-12-28 18:40:33.912 UTC [pool-1-thread-1] App - MetadataResponse(body=InService)
INFO  2024-12-28 18:40:48.803 UTC [pool-1-thread-1] App - MetadataResponse(body=InService)
INFO  2024-12-28 18:41:03.803 UTC [pool-1-thread-1] App - MetadataResponse(body=InService)
INFO  2024-12-28 18:41:18.803 UTC [pool-1-thread-1] App - MetadataResponse(body=InService)
DEBUG 2024-12-28 18:41:33.807 UTC [pool-1-thread-1] software.amazon.awssdk.imds.internal.DefaultEc2MetadataClient - Metadata request response body:
DEBUG 2024-12-28 18:41:33.811 UTC [pool-1-thread-1] software.amazon.awssdk.imds.internal.DefaultEc2MetadataClient - Error while executing EC2Metadata request. Total attempts: 1. The requested metadata at path '/latest/meta-data/autoscaling/target-lifecycle-state' returned Http code 401
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable - software.amazon.awssdk.core.exception.SdkClientException: The requested metadata at path '/latest/meta-data/autoscaling/target-lifecycle-state' returned Http code 401
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at software.amazon.awssdk.imds.internal.DefaultEc2MetadataClient.sendRequest(DefaultEc2MetadataClient.java:178)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at software.amazon.awssdk.imds.internal.DefaultEc2MetadataClient.get(DefaultEc2MetadataClient.java:123)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at App.lambda$new$0(AppManager.java:58)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
ERROR 2024-12-28 18:41:33.812 UTC [pool-1-thread-1] java.lang.Throwable -       at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
ERROR 2024-12-28 18:41:33.813 UTC [pool-1-thread-1] java.lang.Throwable -       at java.base/java.lang.Thread.run(Thread.java:1583)
DEBUG 2024-12-28 18:41:48.800 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Cached value is stale and will be refreshed.
DEBUG 2024-12-28 18:41:48.801 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Refreshing cached value.
DEBUG 2024-12-28 18:41:48.804 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Successfully refreshed cached value. Next Prefetch Time: +1000000000-12-31T23:59:59.999999999Z. Next Stale Time: 2024-12-28T18:42:48.803994713Z
INFO  2024-12-28 18:41:48.806 UTC [pool-1-thread-1] App - MetadataResponse(body=InService)
INFO  2024-12-28 18:42:03.803 UTC [pool-1-thread-1] App - MetadataResponse(body=InService)

Reproduction Steps

This reproduction example uses a short token TTL, but longer TTLs produce the same errors.

Ec2MetadataClient ec2Metadata = Ec2MetadataClient.builder().tokenTtl(ofMinutes(1)).build();
newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
    try {
        System.out.println(ec2Metadata.get("/latest/meta-data/autoscaling/target-lifecycle-state"));
    } catch (Exception exception) {
        exception.printStackTrace();
    }
}, 0, 15, SECONDS);

Possible Solution

This line checks if the current IMDSv2 token is expired before sending the request to the IMDS service, but there appears to be some sort of issue that causes a stale token to be used for the IMDS request. The subsequent call after a 401 Unauthorized exception will refresh the token before the IMDS request, so I suspect a potential race condition (there may need to be some buffer time between the IMDS token TTL and the JVM-cached TTL) or some other timing issue.

Additional Information/Context

No response

AWS Java SDK version used

2.29.43

JDK version used

OpenJDK 64-Bit Server VM Corretto-21.0.5.11.1

Operating System and version

Ubuntu 24.04.1 LTS

@Petersoj Petersoj added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 28, 2024
@debora-ito debora-ito added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 31, 2024
@debora-ito debora-ito self-assigned this Dec 31, 2024
@debora-ito
Copy link
Member

According to the logs, the token should be valid until 2024-12-28T18:41:33.886487462Z:

DEBUG 2024-12-28 18:40:33.886 UTC [pool-1-thread-1] software.amazon.awssdk.utils.cache.CachedSupplier - (Ec2MetadataClient()) Successfully refreshed cached value. Next Prefetch Time: +1000000000-12-31T23:59:59.999999999Z. Next Stale Time: 2024-12-28T18:41:33.886487462Z

The timestamp of the error is 2024-12-28 18:41:33.811 UTC, which is before the stale time by 75 milliseconds, but still before:

DEBUG 2024-12-28 18:41:33.811 UTC [pool-1-thread-1] software.amazon.awssdk.imds.internal.DefaultEc2MetadataClient - Error while executing EC2Metadata request. Total attempts: 1. The requested metadata at path '/latest/meta-data/autoscaling/target-lifecycle-state' returned Http code 401

But at 18:41:33.811 the token is not actually valid anymore. So I agree, there's some timing issue in the way the token expiration is being checked. We'll investigate.

@debora-ito debora-ito added the needs-review This issue or PR needs review from the team. label Dec 31, 2024
@Petersoj
Copy link
Author

Petersoj commented Jan 2, 2025

Not sure if this would work, but a potential (but hacky) solution would be to change this line from:

return Instant.now().isAfter(createdTime.plus(ttl));

to:

// Add some buffer time to the expiration check.
return Instant.now().plusSeconds(60).isAfter(createdTime.plus(ttl));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-review This issue or PR needs review from the team. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants