Using Azure Machine Learning with Huggingface Transformers


The purpose of these examples is to demonstrate how to train Huggingface models on Azure ML, as well as to demonstrate some “real-world” scenarios, such as:

  • Using Huggingface libraries to take pretrained models and finetune them on GLUE benchmarking tasks

  • Comparing the training times between different Azure VM SKUs

  • Performing automatic hyperparameter optimization with Azure ML’s HyperDrive library


This is not meant to be an introduction to the Huggingface libraries. In fact, we borrowed liberally from their example notebooks. You may want to do the same!


We provide the following examples:

  • Submit single GLUE finetuning script to Azure ML. This script forms the basis for all other examples.

  • Experiment comparing training times with different VM SKUs.

  • Submit a HyperDrive experiment for automated hyperparameter optimization.

Run these as follows:



Make sure you run this from an environment with azureml-sdk (pip install azureml-sdk).

Optionally provide glue task and model checkpoint from the command line:

# finetune bert-base-cased on rte task
python --glue_task rte --model_checkpoint bert-base-cased

# compare training times with different VMs
python --glue_task cola --model_checkpoint gpt2

# hyperparameter optimzation with HyperDrive
python --glue_task mnli --model_checkpoint distilroberta-base


Your first run will kick-off an image build: Azure ML is building a docker image with with the requirements.txt installed. This can take 10+ minutes. Future runs will be much faster as this image is cached and reused.


These examples make use of the Huggingface Transformers library. Some aspects we make use of here include:

  • Pretrained Models: We make use of AutoModelForSequenceClassification.from_pretrained to load various pretrained models.

  • Tokenizers: We make use of the AutoTokenizer.from_pretrained method to download a pretrained tokenizer used to prepare inputs to the model.

  • GLUE Datasets and Metrics: The GLUE benchmarking tasks are available through the Huggingface Datasets library. This provides simple APIs that allows us to get the GLUE datasets and metrics.

Azure ML Callback

Callbacks are a mechanism that allows customization within the training loop. Specifically, we make use of the existing AzureMLCallback that is used to send logs to Azure ML. This allows us to visualize metrics via the Azure ML Studio.


In particular the parse_args_into_dataclasses() method. The Trainer class accepts a dataclass TrainingArguments that packages many arguments used during training e.g. learning rates and batch sizes. By using HfArgumentParser we override fields in TrainingArgument from the command-line, while at the same time specifying our own command-line arguments using the standard argparse format.

Local Training

To test transformers training script locally, create a virtual environment and run:

pip install -r requirements.txt
cd src
python --glue_task "cola" --model_checkpoint "distilbert-base-uncased"
