Added Example 31 for GPT-2 transformer model #664
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
New example added for the GPT-2 demonstration
This pull request includes a new example implementation for running GPT-2 based model
distilgpt2
transformer on thewikitext-2-raw-v1
dataset using AIHWKit. The example demonstrates how to convert the model to analog, run training and inference, and visualize the performance metrics using TensorBoard.Details
Key Changes and Additions
3. Logging and Monitoring:
README
Example 31: ['31_gpt2_on_wikitext.py']
This example is adapted from https://github.com/huggingface/notebooks/blob/main/examples/language_modeling.ipynb
The example loads a pre-trained GPT-2 model trained on the wikitext dataset. It then applies
convert_to_analog()
to examine the effects ofdrift_analog_weights()
on inference performance at different weight noise levels. Tensorboard is used to display the perplexity metrics evaluated using the model at various times after training completed.Commandline arguments can be used to control certain options. For example:
python /path/to/aihwkit/examples/31_gpt2_on_wikitext.py -n 0.1 -r "run 1" -l 0.0005 -t
to set the weight noise to 0.1, name the run in Tensorboard "run 1", set the learning rate to 0.0005, and do hardware-aware training.