use warmup steps for lr scheduler, ban steps == -1 #99

wanchaol · 2024-02-28T21:53:57Z

as titled, we don't want to allow steps == -1 case as it would blow up the lr scheduler

lessw2020 · 2024-02-28T21:59:54Z

train_configs/debug_model.toml

@@ -26,7 +26,7 @@ lr = 8e-4
 [training]
 batch_size = 8
 seq_len = 2048
-warmup_pct = 0.20  # lr scheduler warm up
+warmup_steps = 5  # lr scheduler warm up


nit - don't we want 2 here if the default steps is 10 (i.e. 20% rather than make it 50%).

Shall we add a comment here suggesting the pct should be around 20%? A user might have no idea how many steps to warm up when changing total number of training steps.

sounds good I'll update. Is 20% the rule of thumb here already? I thought this is more like a tuning thing?

it's definitely a tuning thing - I did 20% to try and be safe bc we are doing such short runs by default (10 steps!) while still showing some reasonable learning gains.
A more rigourous approach would be 2/(1−β2) training iterations for the warmup with the current linear scheduling, which is 2000 iterations assuming default AdamW. (see https://arxiv.org/abs/1910.04209).

The issue here is we want to show a reasonable curve in the short term and doing that in 10 steps or even 100 is very different than the more rigorous algo above (which can be simplified to 2k warmup iterations) or even plugged in to the lr if we wanted to get fancy and be more exact.

tianyu-l

Please fix the CI errors:

apply pre-commit
modify default training_steps in the test file
the GPU test failed because HF is down and dataset cannot be downloaded ... I guess we don't expect this to happen a lot? But for test it may make sense to use a fake dataset.

tianyu-l · 2024-02-28T22:14:10Z

train_configs/debug_model.toml

@@ -26,7 +26,7 @@ lr = 8e-4
 [training]
 batch_size = 8
 seq_len = 2048
-warmup_pct = 0.20  # lr scheduler warm up
+warmup_steps = 5  # lr scheduler warm up


Shall we add a comment here suggesting the pct should be around 20%? A user might have no idea how many steps to warm up when changing total number of training steps.

as titled, we don't want to allow steps == -1 case as it would blow up the lr scheduler

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 28, 2024

wanchaol force-pushed the lr_steps branch from 73482d7 to d3b3bdf Compare February 28, 2024 21:56

lessw2020 reviewed Feb 28, 2024

View reviewed changes

tianyu-l approved these changes Feb 28, 2024

View reviewed changes

use warmup steps for lr scheduler, ban steps == -1

ca52fa1

as titled, we don't want to allow steps == -1 case as it would blow up the lr scheduler

wanchaol force-pushed the lr_steps branch from d3b3bdf to ca52fa1 Compare February 29, 2024 06:22

wanchaol merged commit 6d5a71e into main Feb 29, 2024
4 checks passed

lessw2020 pushed a commit that referenced this pull request Apr 18, 2024

use warmup steps for lr scheduler, ban steps == -1 (#99)

e60c573

as titled, we don't want to allow steps == -1 case as it would blow up the lr scheduler

philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024

use warmup steps for lr scheduler, ban steps == -1 (pytorch#99)

7acab70

as titled, we don't want to allow steps == -1 case as it would blow up the lr scheduler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use warmup steps for lr scheduler, ban steps == -1 #99

use warmup steps for lr scheduler, ban steps == -1 #99

wanchaol commented Feb 28, 2024

lessw2020 Feb 28, 2024

tianyu-l Feb 28, 2024

wanchaol Feb 29, 2024

lessw2020 Feb 29, 2024

tianyu-l left a comment

tianyu-l Feb 28, 2024

use warmup steps for lr scheduler, ban steps == -1 #99

use warmup steps for lr scheduler, ban steps == -1 #99

Conversation

wanchaol commented Feb 28, 2024

lessw2020 Feb 28, 2024

Choose a reason for hiding this comment

tianyu-l Feb 28, 2024

Choose a reason for hiding this comment

wanchaol Feb 29, 2024

Choose a reason for hiding this comment

lessw2020 Feb 29, 2024

Choose a reason for hiding this comment

tianyu-l left a comment

Choose a reason for hiding this comment

tianyu-l Feb 28, 2024

Choose a reason for hiding this comment