Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llava api #563

Merged
merged 20 commits into from
Mar 13, 2024
Merged

Llava api #563

merged 20 commits into from
Mar 13, 2024

Conversation

SignalRT
Copy link
Collaborator

@SignalRT SignalRT commented Mar 3, 2024

This is a Draft pull request yet. This is a summary of the issues that I need to solve:

  • The changes in load seems to work. It's nearly a hack to do not change all the way to load the llama library. I have concerns with CUDA.
  • To make all the test I updated all the binaries from the update binaries job. With these binaries the embedding test fails. I don´t review the reason. I just skip this test to check llava API. The binaries used are previous to Allow for user specified embedding pooling type ggerganov/llama.cpp#5849
  • I introduced a minimum test to check llava API that pass the test in Mac and Windows. It seems to fail on linux, but I'm not sure that this is related with llava. I nee to check this on a linux computer.
  • ModelParameters.Thread seems to assign an amount of threads to llama parameters when the parameter is null, but do not preserve the setting when you get the property from the context. For my point of view it would be cleaner to ensure that if we set a default value in the assignment, the same value is the one that we get from the property.

@SignalRT
Copy link
Collaborator Author

SignalRT commented Mar 5, 2024

If someone can review this it will be perfect.

I added the reminded binaries in the binary build (I cannot test this).

@SignalRT SignalRT marked this pull request as ready for review March 5, 2024 13:55
@AsakusaRinne
Copy link
Collaborator

@IntptrMax Maybe you would be interested in this PR, which does similar things with yours.

Copy link
Collaborator

@AsakusaRinne AsakusaRinne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The overall looks good to me and it'll be better if you'd like to add an example to LLama.Examples for using it.

How is llava_shared library expected to be published, is there already a preferred way?

LLama/Native/LLamaRopeType.cs Outdated Show resolved Hide resolved
@SignalRT
Copy link
Collaborator Author

SignalRT commented Mar 5, 2024

Great work! The overall looks good to me and it'll be better if you'd like to add an example to LLama.Examples for using it.

How is llava_shared library expected to be published, is there already a preferred way?

At this moment the binaries are in the runtime directory of LLamaSharp. Llava is really not completely integrated in llama.cpp and it doesn´t support GPU. I suppose that the easiest way is to add this libraries to each corresponding backend. But this is my opinion. I would expect changes in llama.cpp project in the future, so may be is better not overthink this now.

This PR depends on @martindevans #565,. Answering you question related with the Examples and executors as commented with Martin I will make a later PR. I just introduce in this PR the binaries, the API and the test of the API.

@martindevans
Copy link
Member

@SignalRT I've just merged #565. Could you rebase this one onto master? That'll get rid of all my changes from the diff in this PR and make it a lot cleaner to review.

@SignalRT
Copy link
Collaborator Author

SignalRT commented Mar 6, 2024

@martindevans , done.

if (ctxContext == IntPtr.Zero)
throw new RuntimeError($"Failed to load LLaVa model {modelPath}.");

return new SafeLlavaModelHandle(ctxContext);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can modify clip_model_load to directly return SafeLlavaModelHandle. That way you never have to directly handle the poniter. See here for example.

@martindevans
Copy link
Member

Various nitpick comments, overall looks like a good foundation though 👍

@SignalRT
Copy link
Collaborator Author

SignalRT commented Mar 7, 2024

@martindevans, reviewed the suggested changes.

@@ -88,7 +62,7 @@ public struct llava_image_embed
/// <param name="embed"></param>
/// <returns></returns>
[DllImport(llavaLibraryName, EntryPoint = "llava_image_embed_free", CallingConvention = CallingConvention.Cdecl)]
public static extern llava_image_embed* llava_image_embed_free(llava_image_embed* embed);
public static extern LLavaImageEmbed* llava_image_embed_free(LLavaImageEmbed* embed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a LLavaImageEmbed is being allocated in some methods and free in another it should be handled with a SafeHandle to absolutely ensure it is disposed properly.

Sorry I didn't spot this in my last review!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it's extremely short lived, in which case it can be handled with try/finally everywhere it's used. But a SafeHandle is probably easier and safer.

@zsogitbe
Copy link
Contributor

zsogitbe commented Mar 8, 2024

Looks good and the strategy fits well into the current library. I have found only one issue: Cuda lava dll is missing.
We will still have to see how memory is freed when we have a working example.

@zsogitbe
Copy link
Contributor

zsogitbe commented Mar 8, 2024

If I may have a suggestion to this API and also for the Llama part:

Save the handle of the DLL when loading the library and provide the possibility to Unload (free) the library completely. This will terminate all processes and free all memory (system or CUDA) definitively regardless of any current and future bugs upstream.

@AsakusaRinne
Copy link
Collaborator

@zsogitbe Could you please tell how to unload the native library? That's something I once searched for. If it's convincing to unload it successfully, I'd like to work on it in another issue (it's out of range of this PR).

@martindevans
Copy link
Member

NativeLibrary.Free.

I'm not actually sure if doing that would free up memory though, or if it would just be a huge memory leak. e.g. does unloading the DLL (without tearing down the whole process) actually free memory that was allocated but never freed, or does it just leak until the process is ended?

@zsogitbe
Copy link
Contributor

zsogitbe commented Mar 8, 2024 via email

@martindevans
Copy link
Member

Windows CI seems to be broken due to #2 on this list: https://github.com/actions/upload-artifact?tab=readme-ov-file#breaking-changes. I suggest either:

  • Set overwrite: true on upload artifacts
  • Revert to v3, upgrade to v4 in a separate PR

@IntptrMax
Copy link

@IntptrMax Maybe you would be interested in this PR, which does similar things with yours.

It looks good!The coding style is much more consistent with LLamaSharp.

@SignalRT
Copy link
Collaborator Author

@AsakusaRinne , @martindevans, I would like to clarify the approach in relation with #594

My original approach was to integrate the LLavaAPI first and after that make another PR with at least an executor and an example. But right now I don´t know if this is the best approach.

@martindevans
Copy link
Member

I think that's still a good plan, if we need to changed around how binaries are distributed/loaded that can always be done in other followup PRs.

@SignalRT
Copy link
Collaborator Author

OK, I will be working on the next PR

@martindevans
Copy link
Member

Shall we merge this one now?

@martindevans martindevans merged commit 3b2836e into SciSharp:master Mar 13, 2024
3 checks passed
@SignalRT SignalRT deleted the LlavaAPI branch March 26, 2024 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants