-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, don't merge] unity.cpp -> ggml master #719
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi and thank you for the interesting project!
-
The
ggml
examples need to be small and I thinkunity.cpp
is not very suitable in this regard. Probably a separate repo would be better. I've been keepingwhisper.cpp
as an example, but it's a bit tedious to keep it in sync, so I'm already considering to remove it, as it does not bring much value -
If this project is migrated to it's own repo, then I think having
kaldi-native-fbank
in the code base (or even as submodule) is fine. It's not very suitable forggml
example (see 1). Regarding the c++14 standard - I guess that's OK, but it mainly depends on what theunity.cpp
repo wants to target. My personal opinion is that c++11 has all the good stuff and from then on it is mostly complications, but this is of course very subjective -
Yes, reusing what is already available is recommended. There are things to improve like supporting F16/F32 data types where missing, implementing GPU kernels (optional), etc. so any progress in this regard is welcome. Make sure to understand how the
test-backend-ops
tool works and add necessary tests to cover the new functionality - it is very valuable -
Adding Python bindings to
ggml
is not currently on the roadmap. There is actually an attempt in the examples: https://github.com/ggerganov/ggml/tree/master/examples/pythonI'm not sure however what is it's state and if it is really usable for anything. If it is easy to update and make use of it, then we can try to support it officially, but would definitely need some help from the OSS community to do that.
I think you need the bindings only to get the
ggml_type
enum, correct?
Also any comment on the best path to integrate unity.cpp with awesome ggml would be appreciated, thanks in advance!
I would say that the best option is a dedicated unity.cpp
repo with ggml
as a submodule. I guess the stable-diffusion.cpp is one of the better examples of this. Adopting the ggml-backend
API is recommended and will provide seamless GPU support in the future (given that all the GPU kernels become available)
Hope this is useful, let us know if you have further questions - would be happy to help!
Edit: didn't mean to hit "Approve" 🤷
struct ggml_tensor * a, | ||
struct ggml_tensor * b, | ||
int p0); | ||
|
||
GGML_API struct ggml_tensor * ggml_conv_1d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an implementation available on master
:
Lines 1499 to 1510 in 6b14d73
GGML_API struct ggml_tensor * ggml_conv_depthwise_2d( | |
struct ggml_context * ctx, | |
struct ggml_tensor * a, | |
struct ggml_tensor * b, | |
int s0, | |
int s1, | |
int p0, | |
int p1, | |
int d0, | |
int d1); | |
Thanks @ggerganov for detailed comments! For 3, thanks for the suggestion, will look at test-backend-ops and add tests for new ops. For 4, The binding is also used in our custom scripts for testing test_unity_cpp.py (will also include in separate repo, now it lives under seamless_communication) and converting fairseq2 checkpoints to ggml format ggml_convert.py. And yes the latter only uses type enum. |
@cndn Please add quantization ggml options to unity.cpp |
This is a WIP PR to sync unity.cpp from seamless_communication (https://github.com/facebookresearch/seamless_communication/tree/main/ggml) to examples/ under ggml. Sharing for visibility and looking for early feedback from community & authors. Feel free to check README for usage.
Questions about ggml changes needed with this PR:
(1) Would you prefer a separate repo for unity.cpp (Like whisper.cpp / llama.cpp we could have a standalone (https://github.com/facebookresearch/seamless_communication/tree/main/ggml/unity.cpp), or checking into examples/ once it’s polished? Or both?
(2) We use kaldi-native-fbank for feature extraction and checked the whole library into ggml as examples/kaldi-native-fbank. Do you prefer a separate installation for this KNF lib? One caveat is with this lib sticking together I needed to update CMAKE_CXX_STANDARD to 14 from 11.
(3) We added several custom operators including batch_norm, glu, and convolution related ones. Realized convolution related ones already existed on master (but didn’t when we started). One TODO item is to merge with them, wondering if there’s ongoing effort on
(a) unifying depthwise conv & im2col ops to have one single ggml_conv_1d op with groups=1 or model_dim as argument
(b) supporting fp32 for im2col. Currently I ran into some issues when using im2col for our model which uses fp32 all the way down, so likely fp32 <-> fp16 cast related, still investigating.
(4) In order to convert fairseq2 checkpoints to ggml format, our script ggml_convert.py rely on ggml-python third party library https://github.com/abetlen/ggml-python (We have ggml.py copy in our folder). Just wondering if there’s a plan to add the python bindings to ggml repo, so we could make sure they are in sync.
Also any comment on the best path to integrate unity.cpp with awesome ggml would be appreciated, thanks in advance!