Added Example 31 for GPT-2 transformer model #664

gyulab · 2024-07-02T14:41:06Z

Description

New example added for the GPT-2 demonstration

This pull request includes a new example implementation for running GPT-2 based model distilgpt2 transformer on the wikitext-2-raw-v1 dataset using AIHWKit. The example demonstrates how to convert the model to analog, run training and inference, and visualize the performance metrics using TensorBoard.

Details

Key Changes and Additions

Model and Dataset:

Implemented an example using the smallest GPT-2 model (distilgpt2).
Utilized the wikitext-2-raw-v1 dataset for training and validation, which is smaller and faster to process compared to openwebtext.

Training and Inference Setup:

Configured the model to use analog inference with specified noise levels.
Added support for digital inference as an option.
Implemented preprocessing functions to handle dataset tokenization.
Provided functionality to train the model and save/load checkpoints.

3. Logging and Monitoring:

Integrated TensorBoard for logging training and validation metrics.
Added TensorBoardCallback to the Trainer for seamless logging.
Configured the script to save logs in a specific directory and visualize them using TensorBoard.

Performance Metrics:

Calculated validation loss and perplexity as the primary performance metrics.
Achieved a validation loss of 4.059
HWA Result: Loss = 4.059 / PPL = 57.9
Digital Inference: Loss = 3.3259 / PPL = 27.8 (Reference: https://huggingface.co/Intel/distilgpt2-wikitext2?text=Once+upon+a+time)

README

Example 31: ['31_gpt2_on_wikitext.py']
This example is adapted from https://github.com/huggingface/notebooks/blob/main/examples/language_modeling.ipynb

The example loads a pre-trained GPT-2 model trained on the wikitext dataset. It then applies convert_to_analog() to examine the effects of drift_analog_weights() on inference performance at different weight noise levels. Tensorboard is used to display the perplexity metrics evaluated using the model at various times after training completed.

Commandline arguments can be used to control certain options. For example: python /path/to/aihwkit/examples/31_gpt2_on_wikitext.py -n 0.1 -r "run 1" -l 0.0005 -t to set the weight noise to 0.1, name the run in Tensorboard "run 1", set the learning rate to 0.0005, and do hardware-aware training.

kaoutar55 · 2024-12-20T15:08:54Z

Thank you @gyulab for your nice contribution. We will review this and get back to you. Can you please test this with the latest aihwkit master branch. Thanks!

Added Example 31

0360d2f

PabloCarmona assigned charlesmackin Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Example 31 for GPT-2 transformer model #664

Added Example 31 for GPT-2 transformer model #664

gyulab commented Jul 2, 2024 •

edited

Loading

kaoutar55 commented Dec 20, 2024

Added Example 31 for GPT-2 transformer model #664

Are you sure you want to change the base?

Added Example 31 for GPT-2 transformer model #664

Conversation

gyulab commented Jul 2, 2024 • edited Loading

Description

Details

README

kaoutar55 commented Dec 20, 2024

gyulab commented Jul 2, 2024 •

edited

Loading