-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llava api #563
Llava api #563
Conversation
If someone can review this it will be perfect. I added the reminded binaries in the binary build (I cannot test this). |
@IntptrMax Maybe you would be interested in this PR, which does similar things with yours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! The overall looks good to me and it'll be better if you'd like to add an example to LLama.Examples
for using it.
How is llava_shared
library expected to be published, is there already a preferred way?
At this moment the binaries are in the runtime directory of LLamaSharp. Llava is really not completely integrated in llama.cpp and it doesn´t support GPU. I suppose that the easiest way is to add this libraries to each corresponding backend. But this is my opinion. I would expect changes in llama.cpp project in the future, so may be is better not overthink this now. This PR depends on @martindevans #565,. Answering you question related with the Examples and executors as commented with Martin I will make a later PR. I just introduce in this PR the binaries, the API and the test of the API. |
Preliminary
Test Threads default value to ensure it doesn´t produce problems.
This reverts commit 264e176.
@martindevans , done. |
LLama/Native/SafeLlavaModelHandle.cs
Outdated
if (ctxContext == IntPtr.Zero) | ||
throw new RuntimeError($"Failed to load LLaVa model {modelPath}."); | ||
|
||
return new SafeLlavaModelHandle(ctxContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can modify clip_model_load
to directly return SafeLlavaModelHandle
. That way you never have to directly handle the poniter. See here for example.
Various nitpick comments, overall looks like a good foundation though 👍 |
@martindevans, reviewed the suggested changes. |
LLama/Native/NativeApi.LLava.cs
Outdated
@@ -88,7 +62,7 @@ public struct llava_image_embed | |||
/// <param name="embed"></param> | |||
/// <returns></returns> | |||
[DllImport(llavaLibraryName, EntryPoint = "llava_image_embed_free", CallingConvention = CallingConvention.Cdecl)] | |||
public static extern llava_image_embed* llava_image_embed_free(llava_image_embed* embed); | |||
public static extern LLavaImageEmbed* llava_image_embed_free(LLavaImageEmbed* embed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a LLavaImageEmbed
is being allocated in some methods and free in another it should be handled with a SafeHandle
to absolutely ensure it is disposed properly.
Sorry I didn't spot this in my last review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it's extremely short lived, in which case it can be handled with try/finally everywhere it's used. But a SafeHandle
is probably easier and safer.
Looks good and the strategy fits well into the current library. I have found only one issue: Cuda lava dll is missing. |
If I may have a suggestion to this API and also for the Llama part: Save the handle of the DLL when loading the library and provide the possibility to Unload (free) the library completely. This will terminate all processes and free all memory (system or CUDA) definitively regardless of any current and future bugs upstream. |
@zsogitbe Could you please tell how to unload the native library? That's something I once searched for. If it's convincing to unload it successfully, I'd like to work on it in another issue (it's out of range of this PR). |
I'm not actually sure if doing that would free up memory though, or if it would just be a huge memory leak. e.g. does unloading the DLL (without tearing down the whole process) actually free memory that was allocated but never freed, or does it just leak until the process is ended? |
No leak is expected if you do it well! The aim is to force free all memory (cuda, sys) allocated by C++. If the C++ process stops all memory is released. We of course need to do it in a clever way by freeing everything we can first.I find the force unloading the library a valuable help if bugs stay in C++ allocation, what is expected...(Sent from Smartphone)
Edited by Martin: removing email footer etc.
|
Windows CI seems to be broken due to #2 on this list: https://github.com/actions/upload-artifact?tab=readme-ov-file#breaking-changes. I suggest either:
|
It looks good!The coding style is much more consistent with LLamaSharp. |
@AsakusaRinne , @martindevans, I would like to clarify the approach in relation with #594 My original approach was to integrate the LLavaAPI first and after that make another PR with at least an executor and an example. But right now I don´t know if this is the best approach. |
I think that's still a good plan, if we need to changed around how binaries are distributed/loaded that can always be done in other followup PRs. |
OK, I will be working on the next PR |
Shall we merge this one now? |
This is a Draft pull request yet. This is a summary of the issues that I need to solve: