Learn how to fine-tune an Llama-2 model using Azure Machine Learning (AML) Studio - UI Dashboard.
- Learn the what, why, and when to use fine-tuning.
- An Azure subscription.
- Access to AML Service.
- An AML resource created.
- Prepare Training and Validation datasets:
- at least 50 high-quality samples (preferably 1,000s) are required.
- must be formatted in the JSON Lines (JSONL) document with UTF-8 encoding.
-
Open Azure Machine Learning Studio at https://ml.azure.com/ and sign in with credentials that have access to AML resource. During the sign-in workflow, select the appropriate directory, Azure subscription, and AML resource.
-
In AML Studio, browse to the Model catalog pane.
- In the search box, type
llama2
.
Assume that you want to fine-tune the llama-2-7b
model for a text generation task (similar process for chat-completion tasks).
- The first step is to press the Fine-tune button to start the fine-tuning process.
- The Fine-tune Llama-2-7b blade lets you specify task type (choose
Text generation
for our case), training data, validation data (optional), test data (optional), and an Azure ML compute cluster.
To run the fine-tuning job, an AML compute cluster machine needs to be created (if you haven't done it before).
- The + New button at the bottom of the blade opens the Create compute cluster pane, where you need to specify the Location (
e.g. West Europe
), Virtual machine tier (Dedicated
), Virtual machine type (GPU
) and Virtual machine size.
- Note that only nvidia ND40 and ND96 VMs are supported for fine-tuning at the moment. If you can't find it in list, you can try choosing other Location or to request quota accordingly.
- Give a name to the compute, and specify the minimum (usually
0
) and maximum (1
for testing purpose) number of nodes.
- Click Next to start the creation process. This may take a couple of minutes.
The next step is to select your training data either from the previously uploaded one or by uploading a new one.
You also need to specify the 'prompt' (i.e. input) and the 'completion' (i.e. output) columns to guide the fine-tuning process.
You can select your validation data by following the similar procedure as you do for the training data. Or, you can leave it as the default setting (i.e. an automtic split of the training data will be used for validation).
You can select your test data by following the similar procedure as you do for the training data. Or, you can leave it as the default setting (i.e. an automtic split of the training data will be used for testing).
Now that you are ready. Click the Finish button at the bottom of the Fine-tune Llama-2-7b blade. This will trigger the actual fine-tuning process to start. Depending on the size of your training data, this process can take from minutes to hours.
After the fine-tuning job finishes, its Status becomes Completed
.
Before deploying the model, you need to register the model first.
Go to Assets > Models pane, select the newly fine-tuned model, and click + Register.
After that, click the + Deploy button to invoke the Deployment blade, where you need to specify the Virtual machine (preferably choose nvidia NC & ND VM series), Instance count, Endpoint name and Deployment name.
Click the Deploy button at the bottom to start the actual deployment process.
This may take a moment, until you see both Provisioning state become Succeeded
.
You can directly test the deployed model via the handy test playground.
You can also consume the API using a popular programming language such as Python
.
When you're done with your custom model, you can delete the deployed endpoint, model, and the compute cluster.
You can also delete the training (and validation and test) files you uploaded to the service, if needed.