finetuning - Out of Memory Error #208

Xiaoshan-jun · 2023-03-15T21:14:20Z

I tried to run the finetuning by

$ python train.py config/finetune_shakespeare.py

Seems like finetune requires a lot of memory. Is there anyway to lower the memory requirement.
The batch size is already 1.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 23.68 GiB total capacity; 21.67 GiB already allocated; 85.06 MiB free; 21.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

judyhappy · 2023-03-20T09:54:48Z

Your GPU memory is too small, suggest to switch to CPU.

ramiil · 2023-03-20T13:10:26Z

For test, you may use smaller model (gpt2-large).

kRMSNormWithRecompute module

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Aug 3, 2024

Merge pull request karpathy#208 from mmoffatt2/krmsnorm-recompute

db2921a

kRMSNormWithRecompute module

gkielian added a commit to gkielian/ReaLLMASIC_nanogpt that referenced this issue Sep 5, 2024

Merge pull request karpathy#208 from mmoffatt2/krmsnorm-recompute

1502f64

kRMSNormWithRecompute module

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetuning - Out of Memory Error #208

finetuning - Out of Memory Error #208

Xiaoshan-jun commented Mar 15, 2023

judyhappy commented Mar 20, 2023

ramiil commented Mar 20, 2023

finetuning - Out of Memory Error #208

finetuning - Out of Memory Error #208

Comments

Xiaoshan-jun commented Mar 15, 2023

judyhappy commented Mar 20, 2023

ramiil commented Mar 20, 2023