[speculator training] Speculator training #35

daviswer · 2024-03-01T20:54:16Z

Add support for speculator training, piggybacking off the existing training utilities.

Training script and speculator-specific utilities are inside the new speculator subfolder.

Uses distributed setup, checkpointing, and dataloaders from this repo. Adds speculator-specific fields to the training config file (to be ignored during non-speculator training). It might make more sense to pull these new fields out into a separate config subclass under speculator utilities - open to suggestions.

Uses speculator architecture from fms-extras.

Uses altered Llama-7b and generate() function from base fms, allowing the speculator to access embedding vectors, not just logits/token predictions. ~~Do not merge this until that issue can be resolved.~~

daviswer · 2024-03-20T17:28:40Z

Plan is to move the include_embeds=True versions of Llama/GPTBigCode/generate() into fms-extras. Once that is done I'll update the relevant imports here and then we can push this in

daviswer · 2024-03-29T18:18:40Z

I've pulled all the include_embeds stuff out of fms into here. We now have EmbedLLaMA and EmbedGPTBigCode subclasses that override the corresponding forward function, and an altered version of generate for use only with this script. We register the subclassed models for use with get_model in the training script.

JRosenkranz

some initial comments on this PR

fms_fsdp/config/training.py

fms_fsdp/utils/dataloader_utils.py

JRosenkranz · 2024-04-10T15:46:05Z

fms_fsdp/utils/dataloader_utils.py

-    # Split line into input and target for the CLM task.
-    data = Preprocess_Dataset(data, causal_lm)
+    # Apply desired postprocessing steps in sequence
+    data = Preprocess_Dataset(data, torch.IntTensor)


is this just wrapping with IntTensor?

Yes - turn list outputs into torch tensors before applying any user-specified preprocess functions

speculator/train_speculator.py

speculator/train_speculator_utils.py

AlpinDale · 2024-08-08T22:27:47Z

Hi! What's the status on this PR? I'd like to train a few speculator models, but I'm not sure how to get started, due to a lack of documentation...

JRosenkranz · 2024-08-09T15:50:33Z

Hi! What's the status on this PR? I'd like to train a few speculator models, but I'm not sure how to get started, due to a lack of documentation...

Hi @AlpinDale Working on getting the documentation and code ready for this. Planning to have sometime in the next 3 weeks. Will keep you updated if we get this sometime sooner.

AlpinDale · 2024-08-09T17:20:17Z

Thanks for the reply, @JRosenkranz

I'd love to wait but I have access to a large cluster of H100s for a limited time, so I wanted to make the most out of it by training as many MLPSpeculator models as possible, on various popular models. If its doable, I'd love some basic instructions on how to get this PR running and start train runs; I can figure out the rest. Different story if the PR itself isn't ready, however 😅

vdabravolski · 2024-08-12T16:09:40Z

Hi, adding +1 to @AlpinDale. We are interested to experiment with MLP speculator, specifically, on latest Llama3.1 models.

Excellent work overall @JRosenkranz !

sahilsuneja1 · 2024-08-13T13:57:24Z

Hi @AlpinDale @vdabravolski,

PR35 is outdated. We expect to release a stable code version in about 3 weeks.

We understand @AlpinDale's urgency and are trying to put this PR in shape so that you can use it in the interim. We hit issues running it against the main branches of foundation-model-stack and fms-extras, and are working on resolving it. If that doesn't work out we can point you to the specific branches for foundation-model-stack, fms-extras and fms-fsdp repos in the meanwhile so that you can train custom speculators, while we work on polishing them and merging them into their respective mains.

There are already a bunch of speculators available here and here, in case there is any overlap with your requirements. For example, the llama3-70b speculator works for llama3.1-70b as well as mentioned here (and so llama3-8b might also work for llama3.1-8b) .

sahilsuneja1 · 2024-08-14T15:15:50Z

@AlpinDale @vdabravolski
PR35 has been updated-- it should now work with foundation-model-stack and fms-extras main branches.
Added a sample training script containing example arguments to pass to the speculator training routine.
Most arg names should be straightforward. For more details please refer: https://arxiv.org/pdf/2404.19124, https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/ and https://github.com/foundation-model-stack/fms-fsdp/blob/main/docs/configurations.md

philschmid · 2024-08-20T13:36:59Z

Is this expected to be merged soon?

JRosenkranz · 2024-08-20T13:40:19Z

Is this expected to be merged soon?

@philschmid We are expecting to have speculator training merged sometime in next 2 weeks.

JRosenkranz · 2024-09-10T14:20:27Z

@philschmid This has been finished and merged in #114. @philschmid The speculator training implementation is now available in main. Please let us know if you have any feedback or questions.

CC: @AlpinDale @vdabravolski

JRosenkranz · 2024-09-10T14:20:38Z

Closing in favor of #114

daviswer added 15 commits February 23, 2024 16:57

Initial stage1 add, changeup datamaker to flag clm instead of default

f8148de

Debugging stage1 script

88ef30b

Add stage2 training, attempt 1

69b4e58

Handling +1 clm correction in dataloaderbuilder

3ab7a3d

Fix stage2 util arg passing

cb8cbc3

Typo fix

8aa8d2c

Fix lr sched

0c62da5

refactor to reflect unified script

5e99028

refactor to reflect unified script pt2

8c96aae

Separate staged warmups

4b06580

Empty cache

abbda6b

Merge in data args parsing

523afda

Fix ckpter shard strat

8d47d44

Checkpointing and mem allocation fixed

023729b

isort and black

e80afd0

daviswer requested review from nairbv and lchu6 March 1, 2024 20:54

daviswer added 3 commits March 1, 2024 15:55

Merge branch 'main' into specu-train

eb6bc67

Fix weight handling for tuple case

4d21547

Add singlefile checkpoint at very end

20a08ec

daviswer marked this pull request as ready for review March 20, 2024 16:25

Merge branch 'main' into specu-train

2558a1e

daviswer added 2 commits March 28, 2024 13:23

Pull in generate and models for include_embeds handling

929c248

Revert include_embeds stuff, not compatible with get_model

34e2035

lchu6 changed the title ~~Speculator training~~ [speculator training] Speculator training Mar 28, 2024

lchu6 added the speculator training label Mar 28, 2024

Register EmbedLlama for training with get_model

d111b74

daviswer requested a review from JRosenkranz March 29, 2024 18:16

daviswer mentioned this pull request Mar 29, 2024

[peculator training] Update benchmark_speculator_logical.py to support gpt_bigcode/granite #62

Open

Add fms-extras to requirements.txt via url, take 1

9af5a88

JRosenkranz reviewed Apr 10, 2024

View reviewed changes

daviswer and others added 13 commits April 11, 2024 12:16

Add docstrings and type hints

f429c6e

Add docstrings and type hints pt2

5df62bb

more type fixes

6693b71

local import fix?

c0934bc

import fixes

970d5c5

Update train_specu to main

db9a7bc

Pass LlamaBlock to get_policies

1048fbc

Return all policies from updated get_policies

fb15e5d

Pass rank to profiler

c83a718

Pass profiler as last arg

d4ec68d

Pass shard strat string, rather than strat itself

423be1f

Correct block import/passing

594a279

Merge branch 'main' into specu-train

5223820

Merge branch 'main' into specu-train

37821d3

sahilsuneja1 added 2 commits August 13, 2024 15:09

register adapter

54b6913

example training script

096c535

JRosenkranz closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[speculator training] Speculator training #35

[speculator training] Speculator training #35

daviswer commented Mar 1, 2024 •

edited

Loading

daviswer commented Mar 20, 2024

daviswer commented Mar 29, 2024

JRosenkranz left a comment

JRosenkranz Apr 10, 2024

daviswer Apr 11, 2024

AlpinDale commented Aug 8, 2024

JRosenkranz commented Aug 9, 2024

AlpinDale commented Aug 9, 2024

vdabravolski commented Aug 12, 2024

sahilsuneja1 commented Aug 13, 2024

sahilsuneja1 commented Aug 14, 2024 •

edited

Loading

philschmid commented Aug 20, 2024

JRosenkranz commented Aug 20, 2024

JRosenkranz commented Sep 10, 2024 •

edited by sahilsuneja1

Loading

JRosenkranz commented Sep 10, 2024

[speculator training] Speculator training #35

[speculator training] Speculator training #35

Conversation

daviswer commented Mar 1, 2024 • edited Loading

daviswer commented Mar 20, 2024

daviswer commented Mar 29, 2024

JRosenkranz left a comment

Choose a reason for hiding this comment

JRosenkranz Apr 10, 2024

Choose a reason for hiding this comment

daviswer Apr 11, 2024

Choose a reason for hiding this comment

AlpinDale commented Aug 8, 2024

JRosenkranz commented Aug 9, 2024

AlpinDale commented Aug 9, 2024

vdabravolski commented Aug 12, 2024

sahilsuneja1 commented Aug 13, 2024

sahilsuneja1 commented Aug 14, 2024 • edited Loading

philschmid commented Aug 20, 2024

JRosenkranz commented Aug 20, 2024

JRosenkranz commented Sep 10, 2024 • edited by sahilsuneja1 Loading

JRosenkranz commented Sep 10, 2024

daviswer commented Mar 1, 2024 •

edited

Loading

sahilsuneja1 commented Aug 14, 2024 •

edited

Loading

JRosenkranz commented Sep 10, 2024 •

edited by sahilsuneja1

Loading