Update doc

bofenghuang · Oct 20, 2023 · eb1a666 · eb1a666
1 parent 72b96b5
commit eb1a666
Show file tree

Hide file tree

Showing 3 changed files with 12 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -125,11 +125,7 @@ More information can be found in [vigogne/data](docs/data.md).
 
 ## Training
 
-For efficient LLM fine-tuning, we utilize a technique called [low-rank adaptation (LoRA)](https://arxiv.org/abs/2106.09685) from 🤗 Hugging Face's [PEFT](https://github.com/huggingface/peft) library. This approach involves freezing the base model's weights and introducing a small number of learnable parameters.
-
-Additionally, for practitioners without access to GPUs with ample memory, it's advisable to consider quantizing certain computations to either 8-bit or 4-bit precision using [LLM.int8()](https://arxiv.org/abs/2208.07339) or [QLoRA](https://arxiv.org/abs/2305.14314). Be aware that this might lead to a minor reduction in speed compared to fp16 or bf16 versions.
-
-We highly recommend the utilization of tools such as [DeepSpeed](https://github.com/microsoft/DeepSpeed) or [FSDP](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api), particularly when engaged in distributed learning scenarios. When dealing with long sequences, [FlashAttention](https://arxiv.org/abs/2307.08691) becomes crucial to speed up training and reduce memory usage.
+The Vigogne models were mostly instruction fine-tuned from other foundation models.
 
 More information can be found in [vigogne/training](docs/training.md).
 

diff --git a/docs/inference.md b/docs/inference.md
@@ -2,7 +2,13 @@
 
 This repository offers multiple options for inference and deployment, including Google Colab notebooks, Gradio demos, [FastChat](https://github.com/lm-sys/FastChat), and [vLLM](https://vllm.ai). It also offers guidance on conducting experiments using [llama.cpp](https://github.com/ggerganov/llama.cpp) on your personal computer.
 
-Thanks to the contributions by [TheBloke](https://huggingface.co/TheBloke), some of Vigogne models have been quantized to [GGML](https://github.com/ggerganov/ggml) format (compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [ctransformers](https://github.com/marella/ctransformers), etc.) and [GTPQ](https://github.com/IST-DASLab/gptq) format (compatible with [GPTQ-for-LLaMA](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)). These formats facilitate testing and development. You can find these models on the [Hugging Face Hub](https://huggingface.co/models?sort=trending&search=TheBloke+vigogne).
+## Quantized Models
+
+The quantized versions of certain models are generously provided by [TheBloke](https://huggingface.co/TheBloke)!
+
+These versions facilitate testing and development with various popular frameworks, including [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), and more.
+
+You can find these models on the [Hugging Face Hub](https://huggingface.co/models?search=TheBloke/vigo).
 
 ## Google Colab Notebook
 

diff --git a/docs/training.md b/docs/training.md
@@ -2,12 +2,12 @@
 
 ## Supervised Fine-tuning
 
-For efficient LLM fine-tuning, we utilize a technique called [low-rank adaptation (LoRA)](https://arxiv.org/abs/2106.09685) from 🤗 Hugging Face's [PEFT](https://github.com/huggingface/peft) library. This approach involves freezing the base model's weights and introducing a small number of learnable parameters.
+For efficient LLM fine-tuning, we use [low-rank adaptation (LoRA)](https://arxiv.org/abs/2106.09685) from 🤗 Hugging Face's [PEFT](https://github.com/huggingface/peft) library. This involves freezing the base model's parameters and introducing a small number of learnable parameters.
 
-Additionally, for practitioners without access to GPUs with ample memory, it's advisable to consider quantizing certain computations to either 8-bit or 4-bit precision using [LLM.int8()](https://arxiv.org/abs/2208.07339) or [QLoRA](https://arxiv.org/abs/2305.14314). Be aware that this might lead to a minor reduction in speed compared to fp16 or bf16 versions.
+For those with limited GPU memory, it's recommended to quantize certain computations to 8-bit or 4-bit precision using [LLM.int8()](https://arxiv.org/abs/2208.07339) or [QLoRA](https://arxiv.org/abs/2305.14314). Note that this might result in a slight training slowdown compared to the fp16 or bf16 versions.
 
-We highly recommend the utilization of tools such as [DeepSpeed](https://github.com/microsoft/DeepSpeed) or [FSDP](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api), particularly when engaged in distributed learning scenarios. When dealing with long sequences, [FlashAttention](https://arxiv.org/abs/2307.08691) becomes crucial to speed up training and reduce memory usage.
+Tools like [DeepSpeed](https://github.com/microsoft/DeepSpeed) or [FSDP](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api) are highly recommended for distributed learning. [FlashAttention](https://arxiv.org/abs/2307.08691) is essential for speeding up training and reducing memory usage with long sequences.
 
 More examples can be found in [examples](https://github.com/bofenghuang/vigogne/blob/main/examples/train).
 
-Since version 2.2, I've refactored the training code, integrating specific elements inspired by the excellent training framework [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl). Thanks to the Axolotl team for their contributions to the open-source community! The primary motivation behind maintaining my own framework is to have full control over the entire training process and customize it to my specific needs. I highly recommend using Axolotl for additional features.
+*Since version 2.2, I've refactored the training code, integrating specific elements inspired by the excellent training framework [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl). Thanks to the Axolotl team for their contributions to the open-source community! The primary motivation behind maintaining my own framework is to have full control over the entire training process and customize it to my specific needs. I highly recommend using Axolotl for additional features.*