Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bedrock: Agent construct fails with Claude 3.5 v2 & Haiku 3.5 #796

Closed
1 task done
mccauleyp opened this issue Nov 12, 2024 · 7 comments
Closed
1 task done

Bedrock: Agent construct fails with Claude 3.5 v2 & Haiku 3.5 #796

mccauleyp opened this issue Nov 12, 2024 · 7 comments
Labels
backlog bug Something isn't working

Comments

@mccauleyp
Copy link

mccauleyp commented Nov 12, 2024

Describe the bug

Attempting to use Claude 3.5 v2 or Haiku 3.5 with the Agent construct will produce a successful deployment but a broken agent that produces "Internal server error" responses. That's because these models require invocation via an inference profile but the construct provisions them in an "on demand" mode that isn't compatible.

Expected Behavior

Should be able to deploy agents using these models.

Current Behavior

Agent deployment succeeds but produces "Internal server error" responses.

Reproduction Steps

Create an agent using Sonnet 3.5 v2 or Haiku 3.5, e.g.:

bedrock.BedrockFoundationModel.ANTHROPIC_CLAUDE_3_5_SONNET_V2_0

Possible Solution

I am working around the issue by using the CDK escape hatch to override the CloudFormation foundation model property, which might provide some hints as to how the construct could be modified:

from aws_cdk import Stack, aws_bedrock, aws_iam
from cdklabs.generative_ai_cdk_constructs import bedrock

AGENT_MODEL = bedrock.BedrockFoundationModel.ANTHROPIC_CLAUDE_3_5_SONNET_V2_0
AGENT_INSTRUCTION = "You are a dog, always respond with 'woof woof'."
AGENT_ALIAS_VERSION = "1"


class BedrockResources:
    def __init__(self, scope: Stack) -> None:
        stage_name = get_stage_name(scope)

        agent_name = "my-agent"
        self.agent = bedrock.Agent(
            scope,
            "Agent",
            name=agent_name,
            instruction=AGENT_INSTRUCTION,
            foundation_model=AGENT_MODEL,
        )
        self._enable_inference_profile(
            scope=scope, agent=self.agent, model=AGENT_MODEL
        )

        self.agent_alias = self.agent.add_alias(
            alias_name=f"{agent_name}-v{AGENT_ALIAS_VERSION}"
        )

    @staticmethod
    def _enable_inference_profile(
        scope: Stack, agent: bedrock.Agent, model: bedrock.BedrockFoundationModel
    ) -> None:
        """Enable models that require or support inference profiles.

        Inference profiles are used for cross-region inference, which improves
        performance by enabling load balancing of requests across regions. Certain
        models like Claude Sonnet 3.5 v2 and Haiku 3.5 must use inference profiles

        This is not yet supported by the Agent CDK construct, so we can override the
        configuration on underlying CloudFormation property.
        """
        model_str = model.to_string()
        inference_profile_arn = f"arn:aws:bedrock:{scope.region}:{scope.account}:inference-profile/us.{model_str}"  # noqa: E501
        foundation_model_arn = f"arn:aws:bedrock:*::foundation-model/{model_str}"

        invoke_inference_profile_policy = aws_iam.Policy(
            scope,
            f"InferenceProfilePolicy{agent.name}",
            statements=[
                aws_iam.PolicyStatement(
                    actions=["bedrock:InvokeModel*", "bedrock:GetInferenceProfile"],
                    resources=[foundation_model_arn, inference_profile_arn],
                )
            ],
            roles=[agent.role],
        )

        cfn_agent: aws_bedrock.CfnAgent = agent.node.find_child("Agent")  # type:ignore[assignment]
        cfn_agent.foundation_model = inference_profile_arn
        cfn_agent.node.add_dependency(invoke_inference_profile_policy)

Additional Information/Context

No response

CDK CLI Version

2.166.0

Framework Version

0.1.279

Node.js Version

v20.11.0

OS

OSX

Language

Python

Language Version

3.12

Region experiencing the issue

us-east-1

Code modification

No

Other information

No response

Service quota

  • I have reviewed the service quotas for this construct
@krokoko
Copy link
Collaborator

krokoko commented Nov 12, 2024

Thanks for reporting this issue @mccauleyp , this should be fixed when #683 is implemented

@krokoko krokoko added backlog and removed needs-triage This issue or PR still needs to be triaged. labels Nov 12, 2024
@krokoko
Copy link
Collaborator

krokoko commented Nov 19, 2024

@mccauleyp are you able to perform model invocations with the permissions you provided in your code snippet above ?

@mccauleyp
Copy link
Author

mccauleyp commented Nov 19, 2024

@mccauleyp are you able to perform model invocations with the permissions you provided in your code snippet above ?

Yep! But note that it's not just permissions that complicates using Sonnet 3.5 v2 and Haiku 3.5. If you have an Action Group, the OpenAPI schema must declare the operationId field for each endpoint, and the operationId must start with an HTTP verb (e.g. get_ or post_) and it must be 18 characters or less.

I spent a few hours yesterday debugging to find the 18 character limit; I can't find that documented anywhere in the AWS or Anthropic docs. When I opened this ticket last week, I had it working for two agents but the actions I was using happened to have operationIds that only went up to 17 characters. When I tried upgrading another agent yesterday, the deployment succeeded but the agent replied with "Internal Service Error" until I brought all the operationIds to within 18 characters.

I assume it's the service team that should be alerted of that issue but I'm not sure where best to submit a ticket for that. Could you communicate the character limit issue to them?

@krokoko
Copy link
Collaborator

krokoko commented Nov 19, 2024

Thanks @mccauleyp ! I was asking as there is a bug currently in the console, if you try to create an agent and use a model with CRIS + select the option to generate a new role, the permissions generated are not sufficient to invoke the model through the agent (I was using it as a reference to add support in this lib). I reported this issue to the service team.

Working on adding support for CRIS in #800. With current changes I am able to support CRIS for Agents, and Prompts. Note: Application Inference profiles are not supported yet in CloudFormation. The code will be there and ready on our end though.

Thanks for the note on the operationId, I will contact the service team and post here as soon as I have an update.

@mccauleyp
Copy link
Author

Ah, yeah I noticed the console bug too and referred to some other docs page for my snippet. One other note for the service or CloudFormation team: If the Action Group schema doesn't include HTTP-verb-prefixed operationIds, the deployment will fail but there's no meaningful error message. I figured that out by creating an agent from the console, which does tell you that the operationId doesn't meet a validation schema. But unfortunately the validation at that point doesn't catch the 18 character thing. I got there from blind trial and error.

@krokoko
Copy link
Collaborator

krokoko commented Nov 20, 2024

Hi @mccauleyp , closing this ticket as https://github.com/awslabs/generative-ai-cdk-constructs/releases/tag/v0.1.283 just released add support for inference profiles. The documentation has an example on how to use CRIS with an agent. For the other points you mentioned, I opened an issue with the service team and will update you as soon as I have an answer. Thank you !

@krokoko
Copy link
Collaborator

krokoko commented Dec 26, 2024

@mccauleyp the service team mentioned that the issue has been fixed (deploying the two models with CRIS for agents). If you face any issues please let us know! Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog bug Something isn't working
Development

No branches or pull requests

2 participants