Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

cmp-nct · 2024-06-20T15:01:07Z

What happened?

Moondream2 is a superb vision model, however on llama.cpp it performs at quality below vanilla llava-1
@vikhyat maybe you'd like to take a look ?

I compared images using python and using llama.cpp, both in fp16 format
moondream2 does recognize images roughly, also the language part seems to work but the quality is totally off through llama.cpp
When asked about spatial information (like lower left corner) it tends to just give anything from the left side or even a random object
On python, the response is precise and surprisingly accurate.

I looked a bit deeper (https://github.com/vikhyat/moondream/blob/main/moondream/vision_encoder.py) and this appears to have support for multiple resolutions, while on llama.cpp it runs in llava-1.5 mode.

However, in my test image llama.cpp creates 729 input embeddings for the image, python did the same.
So it's not just the input embedding count, something deeper is going wrong. My guess is that the sampling/patches are mixed up somehow.

For reference: moondream2 support was merged here: #6899

Name and Version

abd894a

What operating system are you seeing the problem on?

No response

Relevant log output

Below is an example image:

Prompt:<image>\n\nQuestion: What is in the lower left corner?\n\nAnswer:
Answer on python: "In the lower left corner, there is a green sticky note pad."
Answer on llave-cli: "A cup of coffee is in the lower left corner."
(I used the official supplied gguf files)

The text was updated successfully, but these errors were encountered:

ElhamAhmedian · 2024-07-23T05:03:09Z

Has this been resolved?

cmp-nct · 2024-07-23T14:29:24Z

I think we should temporarily remove "moondream" from the supported list, if someone else can confirm my findings ?

EliEron · 2024-07-24T03:38:06Z

I can back up your findings. Using your example image and prompt I'm seeing the same behavior, the Transformers model gives the same answer as in your post, whereas the GGUF gives riveting answer like: Desk, A brown table., A gray surface, and so on.

And testing it on other images I also notice large discrepancies on some images, though it doesn't seem entirely consistent. There are some cases where both perform about the same, but yeah most of the time the GGUF is substantially worse.

Note that I used the same GGUF as you did, so it's possible the issue is in the GGUF itself.

ElhamAhmedian · 2024-07-24T07:16:02Z

@vikhyat can you please share the Python code you used for this? Thanks

vikhyat · 2024-07-26T12:09:00Z

@vikhyat can you please share the Python code you used for this? Thanks

Python code for inference? It's here: https://github.com/vikhyat/moondream

ElhamAhmedian · 2024-07-28T12:55:58Z

I tested moondream2 it does not work with the old llama.cpp version that supported VLMs.

github-actions · 2024-09-11T01:28:13Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions · 2024-10-27T01:09:58Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

cmp-nct added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jun 20, 2024

cmp-nct changed the title ~~Bug: moondream2 inference not correct~~ Bug: moondream2 inference not correct (severe quality degradation compared to reference) Jun 20, 2024

cmp-nct mentioned this issue Jun 28, 2024

How to run on llama.cpp vikhyat/moondream#96

Open

github-actions bot added the stale label Jul 21, 2024

github-actions bot removed the stale label Jul 24, 2024

github-actions bot added the stale label Aug 28, 2024

github-actions bot closed this as completed Sep 11, 2024

HanClinto reopened this Sep 11, 2024

github-actions bot removed the stale label Sep 13, 2024

github-actions bot added the stale label Oct 13, 2024

github-actions bot closed this as completed Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

cmp-nct commented Jun 20, 2024 •

edited

Loading

ElhamAhmedian commented Jul 23, 2024

cmp-nct commented Jul 23, 2024

EliEron commented Jul 24, 2024 •

edited

Loading

ElhamAhmedian commented Jul 24, 2024

vikhyat commented Jul 26, 2024

ElhamAhmedian commented Jul 28, 2024

github-actions bot commented Sep 11, 2024

github-actions bot commented Oct 27, 2024

Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

Bug: moondream2 inference not correct (severe quality degradation compared to reference) #8037

Comments

cmp-nct commented Jun 20, 2024 • edited Loading

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

ElhamAhmedian commented Jul 23, 2024

cmp-nct commented Jul 23, 2024

EliEron commented Jul 24, 2024 • edited Loading

ElhamAhmedian commented Jul 24, 2024

vikhyat commented Jul 26, 2024

ElhamAhmedian commented Jul 28, 2024

github-actions bot commented Sep 11, 2024

github-actions bot commented Oct 27, 2024

cmp-nct commented Jun 20, 2024 •

edited

Loading

EliEron commented Jul 24, 2024 •

edited

Loading