diff --git a/docs/docs/integrations/llms/llamacpp.ipynb b/docs/docs/integrations/llms/llamacpp.ipynb index cf9fa21bb5640..17ccba078596d 100644 --- a/docs/docs/integrations/llms/llamacpp.ipynb +++ b/docs/docs/integrations/llms/llamacpp.ipynb @@ -6,9 +6,9 @@ "source": [ "# Llama.cpp\n", "\n", - "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n", + "[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n", "\n", - "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp), which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", + "It supports inference for [many LLMs](https://github.com/ggerganov/llama.cpp#description) models, which can be accessed on [HuggingFace](https://huggingface.co/TheBloke).\n", "\n", "This notebook goes over how to run `llama-cpp-python` within LangChain.\n", "\n", @@ -54,7 +54,7 @@ "source": [ "### Installation with OpenBLAS / cuBLAS / CLBlast\n", "\n", - "`lama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", + "`llama.cpp` supports multiple BLAS backends for faster processing. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the desired BLAS backend ([source](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast)).\n", "\n", "Example installation with cuBLAS backend:" ] @@ -177,7 +177,11 @@ "\n", "You don't need an `API_TOKEN` as you will run the LLM locally.\n", "\n", - "It is worth understanding which models are suitable to be used on the desired machine." + "It is worth understanding which models are suitable to be used on the desired machine.\n", + "\n", + "[TheBloke's](https://huggingface.co/TheBloke) Hugging Face models have a `Provided files` section that exposes the RAM required to run models of different quantisation sizes and methods (eg: [Llama2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF#provided-files)).\n", + "\n", + "This [github issue](https://github.com/facebookresearch/llama/issues/425) is also relevant to find the right model for your machine." ] }, {