Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update evaluation.md #1442

Merged
merged 7 commits into from
Jan 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions torchchat/utils/docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The evaluation mode of `torchchat.py` script can be used to evaluate your langua

## Examples

### Evaluation example with model in Python
### Evaluation example with model in Python environment

Running wikitext for 10 iterations
```
Expand All @@ -35,33 +35,45 @@ Running wikitext with torch.compile for 10 iterations
python3 torchchat.py eval stories15M --compile --tasks wikitext --limit 10
```

Running multiple tasks and calling eval.py directly (with torch.compile):
Running multiple tasks with torch.compile for evaluation and prefill:
```
python3 torchchat.py eval stories15M --compile --tasks wikitext hellaswag
python3 torchchat.py eval stories15M --compile --compile-prefill --tasks wikitext hellaswag
```

### Evaluation with model exported to PTE with ExecuTorch

Running an exported model with ExecuTorch (as PTE)
Running an exported model with ExecuTorch (as PTE). Advantageously, because you can
load an exported PTE model back into the Python environment with torchchat,
you can run evaluation on the exported model!
```
python3 torchchat.py export stories15M --output-pte-path stories15M.pte
python3 torchchat.py eval stories15M --pte-path stories15M.pte
```

Running multiple tasks and calling eval.py directly (with PTE):
Running multiple tasks directly on the created PTE mobile model:
```
python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag
```

Now let's evaluate the effect of quantization on evaluation results by exporting with quantization using `--quantize` and an exemplary quantization configuration:
```
python3 torchchat.py export stories15M --output-pte-path stories15M.pte --quantize torchchat/quant_config/mobile.json
python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag
```

Now try your own export options to explore different trade-offs between model size, evaluation speed and accuracy using model quantization!

### Evaluation with model exported to DSO with AOT Inductor (AOTI)

Running an exported model with AOT Inductor (DSO model)
Running an exported model with AOT Inductor (DSO model). Advantageously, because you can
load an exported DSO model back into the Python environment with torchchat,
you can run evaluation on the exported model!
```
python3 torchchat.py export stories15M --dtype fast16 --output-dso-path stories15M.so
python3 torchchat.py eval stories15M --dtype fast16 --dso-path stories15M.so
```

Running multiple tasks and calling eval.py directly (with AOTI):
Running multiple tasks with AOTI:
```
python3 torchchat.py eval stories15M --dso-path stories15M.so --tasks wikitext hellaswag
```
Expand Down
Loading