Llava api #563

SignalRT · 2024-03-03T14:09:41Z

This is a Draft pull request yet. This is a summary of the issues that I need to solve:

The changes in load seems to work. It's nearly a hack to do not change all the way to load the llama library. I have concerns with CUDA.
To make all the test I updated all the binaries from the update binaries job. With these binaries the embedding test fails. I don´t review the reason. I just skip this test to check llava API. The binaries used are previous to Allow for user specified embedding pooling type ggerganov/llama.cpp#5849
I introduced a minimum test to check llava API that pass the test in Mac and Windows. It seems to fail on linux, but I'm not sure that this is related with llava. I nee to check this on a linux computer.
ModelParameters.Thread seems to assign an amount of threads to llama parameters when the parameter is null, but do not preserve the setting when you get the property from the context. For my point of view it would be cleaner to ensure that if we set a default value in the assignment, the same value is the one that we get from the property.

SignalRT · 2024-03-05T13:55:30Z

If someone can review this it will be perfect.

I added the reminded binaries in the binary build (I cannot test this).

AsakusaRinne · 2024-03-05T14:04:26Z

@IntptrMax Maybe you would be interested in this PR, which does similar things with yours.

AsakusaRinne

Great work! The overall looks good to me and it'll be better if you'd like to add an example to LLama.Examples for using it.

How is llava_shared library expected to be published, is there already a preferred way?

LLama/Native/LLamaRopeType.cs

SignalRT · 2024-03-05T19:42:36Z

Great work! The overall looks good to me and it'll be better if you'd like to add an example to LLama.Examples for using it.

How is llava_shared library expected to be published, is there already a preferred way?

At this moment the binaries are in the runtime directory of LLamaSharp. Llava is really not completely integrated in llama.cpp and it doesn´t support GPU. I suppose that the easiest way is to add this libraries to each corresponding backend. But this is my opinion. I would expect changes in llama.cpp project in the future, so may be is better not overthink this now.

This PR depends on @martindevans #565,. Answering you question related with the Examples and executors as commented with Martin I will make a later PR. I just introduce in this PR the binaries, the API and the test of the API.

martindevans · 2024-03-06T15:21:10Z

@SignalRT I've just merged #565. Could you rebase this one onto master? That'll get rid of all my changes from the diff in this PR and make it a lot cleaner to review.

Preliminary

Test Threads default value to ensure it doesn´t produce problems.

This reverts commit 264e176.

SignalRT · 2024-03-06T19:42:43Z

@martindevans , done.

martindevans · 2024-03-06T20:50:46Z

LLama/Native/SafeLlavaModelHandle.cs

+            if (ctxContext == IntPtr.Zero)
+                throw new RuntimeError($"Failed to load LLaVa model {modelPath}.");
+
+            return new SafeLlavaModelHandle(ctxContext);


You can modify clip_model_load to directly return SafeLlavaModelHandle. That way you never have to directly handle the poniter. See here for example.

LLama/Native/SafeLlavaModelHandle.cs

LLama/runtimes/deps/avx/libllama.dll

LLama/runtimes/deps/avx2/libllama.dll

LLama/runtimes/deps/avx512/libllama.dll

LLama/runtimes/deps/libllama.dll

martindevans · 2024-03-06T20:54:47Z

Various nitpick comments, overall looks like a good foundation though 👍

SignalRT · 2024-03-07T21:04:33Z

@martindevans, reviewed the suggested changes.

martindevans · 2024-03-07T21:08:37Z

LLama/Native/NativeApi.LLava.cs

@@ -88,7 +62,7 @@ public struct llava_image_embed
    /// <param name="embed"></param>
    /// <returns></returns>
    [DllImport(llavaLibraryName, EntryPoint = "llava_image_embed_free", CallingConvention = CallingConvention.Cdecl)]
-    public static extern llava_image_embed* llava_image_embed_free(llava_image_embed* embed);
+    public static extern LLavaImageEmbed* llava_image_embed_free(LLavaImageEmbed* embed);


If a LLavaImageEmbed is being allocated in some methods and free in another it should be handled with a SafeHandle to absolutely ensure it is disposed properly.

Sorry I didn't spot this in my last review!

Unless it's extremely short lived, in which case it can be handled with try/finally everywhere it's used. But a SafeHandle is probably easier and safer.

zsogitbe · 2024-03-08T07:52:02Z

Looks good and the strategy fits well into the current library. I have found only one issue: Cuda lava dll is missing.
We will still have to see how memory is freed when we have a working example.

zsogitbe · 2024-03-08T08:42:10Z

If I may have a suggestion to this API and also for the Llama part:

Save the handle of the DLL when loading the library and provide the possibility to Unload (free) the library completely. This will terminate all processes and free all memory (system or CUDA) definitively regardless of any current and future bugs upstream.

AsakusaRinne · 2024-03-08T18:37:50Z

@zsogitbe Could you please tell how to unload the native library? That's something I once searched for. If it's convincing to unload it successfully, I'd like to work on it in another issue (it's out of range of this PR).

martindevans · 2024-03-08T19:28:12Z

NativeLibrary.Free.

I'm not actually sure if doing that would free up memory though, or if it would just be a huge memory leak. e.g. does unloading the DLL (without tearing down the whole process) actually free memory that was allocated but never freed, or does it just leak until the process is ended?

zsogitbe · 2024-03-08T19:54:15Z

No leak is expected if you do it well! The aim is to force free all memory (cuda, sys) allocated by C++. If the C++ process stops all memory is released. We of course need to do it in a clever way by freeing everything we can first.I find the force unloading the library a valuable help if bugs stay in C++ allocation, what is expected...(Sent from Smartphone) Edited by Martin: removing email footer etc.

martindevans · 2024-03-08T20:03:10Z

Windows CI seems to be broken due to #2 on this list: https://github.com/actions/upload-artifact?tab=readme-ov-file#breaking-changes. I suggest either:

Set overwrite: true on upload artifacts
Revert to v3, upgrade to v4 in a separate PR

IntptrMax · 2024-03-12T01:43:38Z

@IntptrMax Maybe you would be interested in this PR, which does similar things with yours.

It looks good！The coding style is much more consistent with LLamaSharp.

SignalRT · 2024-03-13T20:29:57Z

@AsakusaRinne , @martindevans, I would like to clarify the approach in relation with #594

My original approach was to integrate the LLavaAPI first and after that make another PR with at least an executor and an example. But right now I don´t know if this is the best approach.

martindevans · 2024-03-13T20:39:12Z

I think that's still a good plan, if we need to changed around how binaries are distributed/loaded that can always be done in other followup PRs.

SignalRT · 2024-03-13T20:44:41Z

OK, I will be working on the next PR

martindevans · 2024-03-13T20:45:51Z

Shall we merge this one now?

AsakusaRinne added enhancement New feature or request break change llava labels Mar 3, 2024

SignalRT force-pushed the LlavaAPI branch from 5019aba to e7b4e2d Compare March 3, 2024 20:13

SignalRT marked this pull request as ready for review March 5, 2024 13:55

AsakusaRinne mentioned this pull request Mar 5, 2024

Feature： Add llava support #577

Closed

AsakusaRinne requested review from martindevans and AsakusaRinne March 5, 2024 14:03

AsakusaRinne reviewed Mar 5, 2024

View reviewed changes

LLama/Native/LLamaRopeType.cs Outdated Show resolved Hide resolved

SignalRT added 15 commits March 6, 2024 20:40

Add llava_binaries, update all binaries to make the test

fc42471

Llava API + LlavaTest

6307a2f

Preliminary

First prototype of Load + Unit Test

b1fe9ab

Temporary run test con branch LlavaAPI

042d6d1

Disable Embed test to review the rest of the test

de01e2c

Restore Embedding test

2f730dc

Use BatchThread to eval image embeddings

384fcef

Test Threads default value to ensure it doesn´t produce problems.

Rename test file

fcf60b4

Update action versions

8418a33

Test only one method, no release embeddings

71a1ff5

Revert "Test only one method, no release embeddings"

fd467ad

This reverts commit 264e176.

Correct API call

a13b3c1

Only test llava related functionality

2d75de3

Cuda and Cblast binaries

0110745

Restore build policy

da8b3fa

SignalRT force-pushed the LlavaAPI branch from ca8bc30 to da8b3fa Compare March 6, 2024 19:41