Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Example 31 for GPT-2 transformer model #664

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gyulab
Copy link

@gyulab gyulab commented Jul 2, 2024

Description

New example added for the GPT-2 demonstration

This pull request includes a new example implementation for running GPT-2 based model distilgpt2 transformer on the wikitext-2-raw-v1 dataset using AIHWKit. The example demonstrates how to convert the model to analog, run training and inference, and visualize the performance metrics using TensorBoard.

Details

Key Changes and Additions

  1. Model and Dataset:
  • Implemented an example using the smallest GPT-2 model (distilgpt2).
  • Utilized the wikitext-2-raw-v1 dataset for training and validation, which is smaller and faster to process compared to openwebtext.
    Screenshot 2024-07-03 at 4 08 17 PM
  1. Training and Inference Setup:
  • Configured the model to use analog inference with specified noise levels.
  • Added support for digital inference as an option.
  • Implemented preprocessing functions to handle dataset tokenization.
  • Provided functionality to train the model and save/load checkpoints.

Screenshot 2024-07-03 at 4 08 33 PM
3. Logging and Monitoring:

  • Integrated TensorBoard for logging training and validation metrics.
  • Added TensorBoardCallback to the Trainer for seamless logging.
  • Configured the script to save logs in a specific directory and visualize them using TensorBoard.
  1. Performance Metrics:

README

Example 31: ['31_gpt2_on_wikitext.py']
This example is adapted from https://github.com/huggingface/notebooks/blob/main/examples/language_modeling.ipynb

The example loads a pre-trained GPT-2 model trained on the wikitext dataset. It then applies convert_to_analog() to examine the effects of drift_analog_weights() on inference performance at different weight noise levels. Tensorboard is used to display the perplexity metrics evaluated using the model at various times after training completed.

Commandline arguments can be used to control certain options. For example: python /path/to/aihwkit/examples/31_gpt2_on_wikitext.py -n 0.1 -r "run 1" -l 0.0005 -t to set the weight noise to 0.1, name the run in Tensorboard "run 1", set the learning rate to 0.0005, and do hardware-aware training.

@kaoutar55
Copy link
Collaborator

Thank you @gyulab for your nice contribution. We will review this and get back to you. Can you please test this with the latest aihwkit master branch. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants