FR: allow multimodal input / vision / images #429

thiswillbeyourgithub · 2024-04-23T07:46:47Z

It would be simple to make it so that in the prompt text paths/urls to images are replaced by image call.

I could then for example add a shortcut so that images that are in my clipboard could be pasted to /tmp and add a path automatically.

See the kind of workflow implemented in ollama:

What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.

Somewhat related to:

Edit:
Oh I see that there's already partial support there: #332

It should be :

The text was updated successfully, but these errors were encountered:

thiswillbeyourgithub · 2024-04-26T12:55:11Z

For anyone interested I added a patch file and demo showcasing the vision feature in this PR

thiswillbeyourgithub mentioned this issue Apr 23, 2024

[HELP NEEDED] add vision support / multimodal image input #430

Open

thiswillbeyourgithub closed this as completed Apr 26, 2024

Provide feedback