Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat Support image URLs in tool outputs for Langchain::Assistant #894

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Eth3rnit3
Copy link

@Eth3rnit3 Eth3rnit3 commented Dec 3, 2024

PR Description: Enhanced Tool Message Support with Image URLs

This PR updates the Langchain::Assistant module to allow tool messages to include optional image URLs alongside text content. Key changes include:

  1. New image_url Parameter:

    • The submit_tool_output method now supports an image_url parameter for richer tool outputs.
  2. Backend Adjustments:

    • Adaptations were made to ensure compatibility with various backends (Anthropic, Mistral AI, OpenAI).
    • Structured content formatting is now used consistently across implementations.
  3. Expanded Test Coverage:

    • Tests were added to validate tool message handling with and without image URLs.

This enhancement enables more flexible and expressive outputs for tools, paving the way for improved user interactions.

@andreibondarev
Copy link
Collaborator

andreibondarev commented Dec 4, 2024

@Eth3rnit3 Thank you for your PR. Your thoughts on the below... ?

What if we modified this to:

content, image_url = tool_instance.send(method_name, **tool_arguments)

# Rename the method parameter from output: to content:
submit_tool_output(tool_call_id: tool_call_id, content: content, image_url: image_url)

Something else to think about is that we might want to support the base64 encoded image re-presentation in the future, something like:

        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "/9j/4AAQSkZJRg...",
          }
        }

@Eth3rnit3
Copy link
Author

Yes you're right @andreibondarev it's better this way, it's more implicit and avoids parsing what is output from the tool.

  • Works with Anthropic
  • Works with Mistral

Here's the error message for OpenAPI, which doesn't support:image_url in a message with the role: "tool"

OpenAI HTTP Error (spotted in ruby-openai 7.3.1): {"error"=>{"message"=>"Invalid 'messages[3]'. Image URLs are only allowed for messages with role 'user', but this message with role 'tool' contains an image URL.", "type"=>"invalid_request_error", "param"=>"messages[3]", "code"=>"invalid_value"}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants