-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why divert from the default GGML versions? #5
Comments
That originated from the first falcon demo examples, I did not want to break compatibility with existing models. I agree that the magic and versioning of ggml binaries should be used more but it is exactly the same situation on llama.cpp which uses the layer count as model indicator not the magic. So ggllm.cpp as well as llama.cpp both use the layer number as primary indicator of the model. Regarding mmap and ggjt support: we already got that.
|
I get your point about not needing to use the version number to determine the model type. However, in my project where I'm integrating Falcon support into rustformers/llm, the unusual version numbers cause issues. We have a universal GGML/GGJT loader in place that manages all loading tasks, built to work with the GGML and LLama.cpp repo. With this setup, version numbers like 7 and 40 aren't recognized as valid. I could create and quantize my own Falcon models in the v3 GGJT format, but that would result in various models online that are only compatible with certain libraries. That's something I'd rather avoid. Maybe the new V4 file format will get implemented soon, and we can sidestep the issue of having a fragmented ecosystem. |
* use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (cmp-nct#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-authored-by: YellowRoseCx <[email protected]> Co-authored-by: ardfork <[email protected]> Co-authored-by: funnbot <[email protected]> Co-authored-by: Engininja2 <[email protected]> Co-authored-by: Kerfuffle <[email protected]> Co-authored-by: jammm <[email protected]> Co-authored-by: jdecourval <[email protected]>
The introduction of additional version numbers (7 and 40) brings additional complexity to the ggml ecosystem.
Basically this could also be solved by simply reading the file magic. If its a GGML file don't read the version and disable mmap. If the magic is GGJT read the version (as this format is versioned) and enable mmap.
This would also allow the creation of Falcon-7B ggjt files with mmap support.
The text was updated successfully, but these errors were encountered: