Fine-tune Transformers in PyTorch Using Hugging Face Transformers

Source Node: 747727

fine-tune transformers

This notebook is designed to use a pretrained transformers model and fine-tune it on a classification task. The focus of this tutorial will be on the code itself and how to adjust it to your needs.

This notebook is using the AutoClasses from transformer by Hugging Face functionality. This functionality can guess a model’s configuration, tokenizer and architecture just by passing in the model’s name. This allows for code reusability on a large number of transformers models!

If this in-depth educational content is useful for you, you can subscribe to our AI research mailing list to be alerted when we release new material. 

What should I know for this notebook?

I provided enough instructions and comments to be able to follow along with minimum Python coding knowledge.

Since I am using PyTorch to fine-tune our transformers models any knowledge on PyTorch is very useful. Knowing a little bit about the transformers library helps too.

How to use this notebook?

I built this notebook with reusability in mind. The way I load the dataset into the PyTorch Dataset class is pretty standard and can be easily reused for any other dataset.

The only modifications needed to use your own dataset will be in reading in the dataset inside the MovieReviewsDataset class which uses PyTorch Dataset. The DataLoader will return a dictionary of batch inputs format so that it can be fed straight to the model using the statement: outputs = model(**batch)As long as this statement holds, the rest of the code will work!

What transformers models work with this notebook?

There are rare cases where I use a different model than Bert when dealing with classification from text data. When there is a need to run a different transformer model architecture, which one would work with this code?

Since the name of the notebooks is finetune_transformers it should work with more than one type of transformers.

I ran this notebook across all the pretrained models found on Hugging Face Transformer. This way you know ahead of time if the model you plan to use works with this code without any modifications.

The list of pretrained transformers models that work with this notebook can be found here. There are 73 models that worked ? and 33 models that failed to work ? with this notebook.

Dataset

This notebook will cover fine-tune transformers for binary classification task. I will use the well known movies reviews positive — negative labeled Large Movie Review Dataset.

The description provided on the Stanford website:

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more details.

Why this dataset? I believe is an easy to understand and use dataset for classification. I think sentiment data is always fun to work with.

Coding

Now let’s do some coding! We will go through each coding cell in the notebook and describe what it does, what’s the code, and when is relevant — show the output.

I made this format to be easy to follow if you decide to run each code cell in your own python notebook.

When I learn from a tutorial I always try to replicate the results. I believe it’s easy to follow along if you have the code next to the explanations.

Downloads

Download the Large Movie Review Dataset and unzip it locally.

# Download the dataset.
!wget -q -nc http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
# Unzip the dataset.
!tar -zxf /content/aclImdb_v1.tar.gz

Installs

  • transformers library needs to be installed to use all the awesome code from Hugging Face. To get the latest version I will install it straight from GitHub.
  • ml_things library used for various machine learning related tasks. I created this library to reduce the amount of code I need to write for each machine learning project. Give it a try!
# Install transformers library.
!pip install -q git+https://github.com/huggingface/transformers.git
# Install helper functions.
!pip install -q git+https://github.com/gmihaila/ml_things.git
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done |████████████████████████████████| 2.9MB 6.7MB/s |████████████████████████████████| 890kB 48.9MB/s |████████████████████████████████| 1.1MB 49.0MB/s Building wheel for transformers (PEP 517) ... done
Building wheel for sacremoses (setup.py) ... done |████████████████████████████████| 71kB 5.2MB/s Building wheel for ml-things (setup.py) ... done
Building wheel for ftfy (setup.py) ... done

Imports

Import all needed libraries for this notebook.

Declare parameters used for this notebook:

  • set_seed(123) â€“ Always good to set a fixed seed for reproducibility.
  • epochs â€“ Number of training epochs (authors recommend between 2 and 4).
  • batch_size â€“ Number of batches – depending on the max sequence length and GPU memory. For 512 sequence length a batch of 10 USUALY works without cuda memory issues. For small sequence length can try batch of 32 or higher.
  • max_length â€“ Pad or truncate text sequences to a specific length. I will set it to 60 tokens to speed up training.
  • device â€“ Look for gpu to use. I will use cpu by default if no gpu found.
  • model_name_or_path â€“ Name of transformers model – will use already pretrained model. Path of transformer model – will load your own model from local disk. I always like to start off with bert-base-cased12-layer, 768-hidden, 12-heads, 109M parameters. Trained on cased English text.
  • labels_ids â€“ Dictionary of labels and their id – this will be used to convert string labels to numbers.
  • n_labels â€“ How many labels are we using in this dataset. This is used to decide size of classification head.
import io
import os
import torch
from tqdm.notebook import tqdm
from torch.utils.data import Dataset, DataLoader
from ml_things import plot_dict, plot_confusion_matrix, fix_text
from sklearn.metrics import classification_report, accuracy_score
from transformers import (AutoConfig, AutoModelForSequenceClassification, AutoTokenizer, AdamW, get_linear_schedule_with_warmup, set_seed, ) # Set seed for reproducibility,
set_seed(123) # Number of training epochs (authors recommend between 2 and 4)
epochs = 4 # Number of batches - depending on the max sequence length and GPU memory.
# For 512 sequence length batch of 10 works without cuda memory issues.
# For small sequence length can try batch of 32 or higher.
batches = 32 # Pad or truncate text sequences to a specific length
# if `None` it will use maximum sequence of word piece tokens allowed by model.
max_length = 60 # Look for gpu to use. Will use `cpu` by default if no gpu found.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Name of transformers model - will use already pretrained model.
# Path of transformer model - will load your own model from local disk.
model_name_or_path = 'bert-base-cased' # Dicitonary of labels and their id - this will be used to convert.
# String labels to number ids.
labels_ids = {'neg': 0, 'pos': 1} # How many labels are we using in training.
# This is used to decide size of classification head.
n_labels = len(labels_ids)

Helper Functions

I like to keep all Classes and functions that I will use in this notebook under this section to help maintain a clean look of the notebook:

MovieReviewsDataset

If you worked with PyTorch before this is pretty standard. We need this class to read in our dataset, parse it, use tokenizer that transforms text into numbers and get it into a nice format to be fed to the model.

Lucky for use, Hugging Face thought of everything and made the tokenizer do all the heavy lifting (split text into tokens, padding, truncating, encode text into numbers) and is very easy to use!

In this class I only need to read in the content of each file, use fix_text to fix any Unicode problems and keep track of positive and negative sentiments.

I will append all texts and labels in lists that later I will feed to the tokenizer and to the label ids to transform everything into numbers.

There are three main parts of this PyTorch Dataset class:

  • init() where we read in the dataset and transform text and labels into numbers.
  • __len__() where we need to return the number of examples we read in. This is used when calling len(MovieReviewsDataset()) .
  • __getitem__() always takes as an input an int valuet hat represents which example from our examples to return from our dataset. If a value of 3 is passed, we will return the example form our dataset at position 3. It needs to return an object with the format that can be fed to our model. Luckily our tokenizer does that for us and returns a dictionary of variables ready to be fed to the model in this way:model(**inputs).
class MovieReviewsDataset(Dataset): r"""PyTorch Dataset class for loading data. This is where the data parsing happens and where the text gets encoded using loaded tokenizer. This class is built with reusability in mind: it can be used as is as long as the `dataloader` outputs a batch in dictionary format that can be passed straight into the model - `model(**batch)`. Arguments: path (:obj:`str`): Path to the data partition. use_tokenizer (:obj:`transformers.tokenization_?`): Transformer type tokenizer used to process raw text into numbers. labels_ids (:obj:`dict`): Dictionary to encode any labels names into numbers. Keys map to labels names and Values map to number associated to those labels. max_sequence_len (:obj:`int`, `optional`) Value to indicate the maximum desired sequence to truncate or pad text sequences. If no value is passed it will used maximum sequence size supported by the tokenizer and model. """ def __init__(self, path, use_tokenizer, labels_ids, max_sequence_len=None): # Check if path exists. if not os.path.isdir(path): # Raise error if path is invalid. raise ValueError('Invalid `path` variable! Needs to be a directory') # Check max sequence length. max_sequence_len = use_tokenizer.max_len if max_sequence_len is None else max_sequence_len texts = [] labels = [] print('Reading partitions...') # Since the labels are defined by folders with data we loop # through each label. for label, label_id, in tqdm(labels_ids.items()): sentiment_path = os.path.join(path, label) # Get all files from path. files_names = os.listdir(sentiment_path)#[:10] # Sample for debugging. print('Reading %s files...' % label) # Go through each file and read its content. for file_name in tqdm(files_names): file_path = os.path.join(sentiment_path, file_name) # Read content. content = io.open(file_path, mode='r', encoding='utf-8').read() # Fix any unicode issues. content = fix_text(content) # Save content. texts.append(content) # Save encode labels. labels.append(label_id) # Number of exmaples. self.n_examples = len(labels) # Use tokenizer on texts. This can take a while. print('Using tokenizer on all texts. This can take a while...') self.inputs = use_tokenizer(texts, add_special_tokens=True, truncation=True, padding=True, return_tensors='pt', max_length=max_sequence_len) # Get maximum sequence length. self.sequence_len = self.inputs['input_ids'].shape[-1] print('Texts padded or truncated to %d length!' % self.sequence_len) # Add labels. self.inputs.update({'labels':torch.tensor(labels)}) print('Finished!n') return def __len__(self): r"""When used `len` return the number of examples. """ return self.n_examples def __getitem__(self, item): r"""Given an index return an example from the position. Arguments: item (:obj:`int`): Index position to pick an example to return. Returns: :obj:`Dict[str, object]`: Dictionary of inputs that feed into the model. It holddes the statement `model(**Returned Dictionary)`. """ return {key: self.inputs[key][item] for key in self.inputs.keys()}

train(dataloader, optimizer_, scheduler_, device_)

I created this function to perform a full pass through the DataLoader object (the DataLoader object is created from our Dataset type object using the MovieReviewsDataset class). This is basically one epoch train through the entire dataset.

The dataloader is created from PyTorch DataLoader which takes the object created from MovieReviewsDataset class and puts each example in batches. This way we can feed our model batches of data!

The optimizer_ and scheduler_ are very common in PyTorch. They are required to update the parameters of our model and update our learning rate during training. There is a lot more than that but I won’t go into details. This can actually be a huge rabbit hole since A LOT happens behind these functions that we don’t need to worry. Thank you PyTorch!

In the process we keep track of the actual labels and the predicted labels along with the loss.

def train(dataloader, optimizer_, scheduler_, device_): r""" Train pytorch model on a single pass through the data loader. It will use the global variable `model` which is the transformer model loaded on `_device` that we want to train on. This function is built with reusability in mind: it can be used as is as long as the `dataloader` outputs a batch in dictionary format that can be passed straight into the model - `model(**batch)`. Arguments: dataloader (:obj:`torch.utils.data.dataloader.DataLoader`): Parsed data into batches of tensors. optimizer_ (:obj:`transformers.optimization.AdamW`): Optimizer used for training. scheduler_ (:obj:`torch.optim.lr_scheduler.LambdaLR`): PyTorch scheduler. device_ (:obj:`torch.device`): Device used to load tensors before feeding to model. Returns: :obj:`List[List[int], List[int], float]`: List of [True Labels, Predicted Labels, Train Average Loss]. """ # Use global variable for model. global model # Tracking variables. predictions_labels = [] true_labels = [] # Total loss for this epoch. total_loss = 0 # Put the model into training mode. model.train() # For each batch of training data... for batch in tqdm(dataloader, total=len(dataloader)): # Add original labels - use later for evaluation. true_labels += batch['labels'].numpy().flatten().tolist() # move batch to device batch = {k:v.type(torch.long).to(device_) for k,v in batch.items()} # Always clear any previously calculated gradients before performing a # backward pass. model.zero_grad() # Perform a forward pass (evaluate the model on this training batch). # This will return the loss (rather than the model output) because we # have provided the `labels`. # The documentation for this a bert model function is here: # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification outputs = model(**batch) # The call to `model` always returns a tuple, so we need to pull the # loss value out of the tuple along with the logits. We will use logits # later to calculate training accuracy. loss, logits = outputs[:2] # Accumulate the training loss over all of the batches so that we can # calculate the average loss at the end. `loss` is a Tensor containing a # single value; the `.item()` function just returns the Python value # from the tensor. total_loss += loss.item() # Perform a backward pass to calculate the gradients. loss.backward() # Clip the norm of the gradients to 1.0. # This is to help prevent the "exploding gradients" problem. torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # Update parameters and take a step using the computed gradient. # The optimizer dictates the "update rule"--how the parameters are # modified based on their gradients, the learning rate, etc. optimizer.step() # Update the learning rate. scheduler.step() # Move logits and labels to CPU logits = logits.detach().cpu().numpy() # Convert these logits to list of predicted labels values. predictions_labels += logits.argmax(axis=-1).flatten().tolist() # Calculate the average loss over the training data. avg_epoch_loss = total_loss / len(dataloader) # Return all true labels and prediction for future evaluations. return true_labels, predictions_labels, avg_epoch_loss

validation(dataloader, device_)

I implemented this function in a very similar way as train but without the parameters update, backward pass and gradient decent part. We don’t need to do all of those VERY computationally intensive tasks because we only care about our model’s predictions.

I use the DataLoader in a similar way as in train to get out batches to feed to our model.

In the process I keep track of the actual labels and the predicted labels along with the loss.

def validation(dataloader, device_): r"""Validation function to evaluate model performance on a separate set of data. This function will return the true and predicted labels so we can use later to evaluate the model's performance. This function is built with reusability in mind: it can be used as is as long as the `dataloader` outputs a batch in dictionary format that can be passed straight into the model - `model(**batch)`. Arguments: dataloader (:obj:`torch.utils.data.dataloader.DataLoader`): Parsed data into batches of tensors. device_ (:obj:`torch.device`): Device used to load tensors before feeding to model. Returns: :obj:`List[List[int], List[int], float]`: List of [True Labels, Predicted Labels, Train Average Loss] """ # Use global variable for model. global model # Tracking variables predictions_labels = [] true_labels = [] #total loss for this epoch. total_loss = 0 # Put the model in evaluation mode--the dropout layers behave differently # during evaluation. model.eval() # Evaluate data for one epoch for batch in tqdm(dataloader, total=len(dataloader)): # add original labels true_labels += batch['labels'].numpy().flatten().tolist() # move batch to device batch = {k:v.type(torch.long).to(device_) for k,v in batch.items()} # Telling the model not to compute or store gradients, saving memory and # speeding up validation with torch.no_grad(): # Forward pass, calculate logit predictions. # This will return the logits rather than the loss because we have # not provided labels. # token_type_ids is the same as the "segment ids", which # differentiates sentence 1 and 2 in 2-sentence tasks. # The documentation for this `model` function is here: # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification outputs = model(**batch) # The call to `model` always returns a tuple, so we need to pull the # loss value out of the tuple along with the logits. We will use logits # later to to calculate training accuracy. loss, logits = outputs[:2] # Move logits and labels to CPU logits = logits.detach().cpu().numpy() # Accumulate the training loss over all of the batches so that we can # calculate the average loss at the end. `loss` is a Tensor containing a # single value; the `.item()` function just returns the Python value # from the tensor. total_loss += loss.item() # get predicitons to list predict_content = logits.argmax(axis=-1).flatten().tolist() # update list predictions_labels += predict_content # Calculate the average loss over the training data. avg_epoch_loss = total_loss / len(dataloader) # Return all true labels and prediciton for future evaluations. return true_labels, predictions_labels, avg_epoch_loss

Load Model and Tokenizer

Loading the three essential parts of the pretrained transformers: configurationtokenizer and model. I also need to load the model on the device I’m planning to use (GPU / CPU).

Since I use the AutoClass functionality from Hugging Face I only need to worry about the model’s name as input and the rest is handled by the transformers library.

# Get model configuration.
print('Loading configuraiton...')
model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_name_or_path, num_labels=n_labels) # Get model's tokenizer.
print('Loading tokenizer...')
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path) # Get the actual model.
print('Loading model...')
model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=model_name_or_path, config=model_config) # Load model to defined device.
model.to(device)
print('Model loaded to `%s`'%device)
Loading configuraiton...
Loading tokenizer...
Loading model...
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model loaded to `cuda`

Dataset and DataLoader

This is where I create the PyTorch Dataset and DataLoader objects that will be used to feed data into our model.

This is where I use the MovieReviewsDataset class and create the dataset variables. Since data is partitioned for both train and test I will create a PyTorch Dataset and PyTorch DataLoader object for train and test. ONLY for simplicity I will use the test as validation. In practice NEVER USE THE TEST DATA FOR VALIDATION!

print('Dealing with Train...')
# Create pytorch dataset.
train_dataset = MovieReviewsDataset(path='/content/aclImdb/train', use_tokenizer=tokenizer, labels_ids=labels_ids, max_sequence_len=max_length)
print('Created `train_dataset` with %d examples!'%len(train_dataset)) # Move pytorch dataset into dataloader.
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
print('Created `train_dataloader` with %d batches!'%len(train_dataloader)) print() print('Dealing with ...')
# Create pytorch dataset.
valid_dataset = MovieReviewsDataset(path='/content/aclImdb/test', use_tokenizer=tokenizer, labels_ids=labels_ids, max_sequence_len=max_length)
print('Created `valid_dataset` with %d examples!'%len(valid_dataset)) # Move pytorch dataset into dataloader.
valid_dataloader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=False)
print('Created `eval_dataloader` with %d batches!'%len(valid_dataloader))
Dealing with Train...
Reading partitions...
100%|████████████████████████████████|2/2 [00:34<00:00, 17.28s/it]
Reading neg files...
100%|████████████████████████████████|12500/12500 [00:34<00:00, 362.01it/s] Reading pos files...
100%|████████████████████████████████|12500/12500 [00:23<00:00, 534.34it/s] Using tokenizer on all texts. This can take a while...
Texts padded or truncated to 40 length!
Finished! Created `train_dataset` with 25000 examples!
Created `train_dataloader` with 25000 batches! Dealing with ...
Reading partitions...
100%|████████████████████████████████|2/2 [01:28<00:00, 44.13s/it]
Reading neg files...
100%|████████████████████████████████|12500/12500 [01:28<00:00, 141.71it/s] Reading pos files...
100%|████████████████████████████████|12500/12500 [01:17<00:00, 161.60it/s] Using tokenizer on all texts. This can take a while...
Texts padded or truncated to 40 length!
Finished! Created `valid_dataset` with 25000 examples!
Created `eval_dataloader` with 25000 batches!

Train

I create an optimizer and scheduler that will be used by PyTorch in training.

I loop through the number of defined epochs and call the train and validation functions.

I will output similar info after each epoch as in Keras: train_loss: — val_loss: — train_acc: — valid_acc.

After training, I plot the train and validation loss and accuracy curves to check how the training went.

# Note: AdamW is a class from the huggingface library (as opposed to pytorch) # I believe the 'W' stands for 'Weight Decay fix"
optimizer = AdamW(model.parameters(), lr = 2e-5, # args.learning_rate - default is 5e-5, our notebook had 2e-5 eps = 1e-8 # args.adam_epsilon - default is 1e-8. ) # Total number of training steps is number of batches * number of epochs.
# `train_dataloader` contains batched data so `len(train_dataloader)` gives # us the number of batches.
total_steps = len(train_dataloader) * epochs # Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, # Default value in run_glue.py num_training_steps = total_steps) # Store the average loss after each epoch so we can plot them.
all_loss = {'train_loss':[], 'val_loss':[]}
all_acc = {'train_acc':[], 'val_acc':[]} # Loop through each epoch.
print('Epoch')
for epoch in tqdm(range(epochs)): print() print('Training on batches...') # Perform one full pass over the training set. train_labels, train_predict, train_loss = train(train_dataloader, optimizer, scheduler, device) train_acc = accuracy_score(train_labels, train_predict) # Get prediction form model on validation data. print('Validation on batches...') valid_labels, valid_predict, val_loss = validation(valid_dataloader, device) val_acc = accuracy_score(valid_labels, valid_predict) # Print loss and accuracy values to see how training evolves. print(" train_loss: %.5f - val_loss: %.5f - train_acc: %.5f - valid_acc: %.5f"%(train_loss, val_loss, train_acc, val_acc)) print() # Store the loss value for plotting the learning curve. all_loss['train_loss'].append(train_loss) all_loss['val_loss'].append(val_loss) all_acc['train_acc'].append(train_acc) all_acc['val_acc'].append(val_acc) # Plot loss curves.
plot_dict(all_loss, use_xlabel='Epochs', use_ylabel='Value', use_linestyles=['-', '--']) # Plot accuracy curves.
plot_dict(all_acc, use_xlabel='Epochs', use_ylabel='Value', use_linestyles=['-', '--'])
Epoch 100%|████████████████████████████████|4/4[13:49<00:00, 207.37s/it] Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00,4.86it/s] Validation on batches... 100%|████████████████████████████████|782/782[00:46<00:00,16.80it/s] train_loss: 0.44816 - val_loss: 0.38655 - train_acc: 0.78372 - valid_acc: 0.81892 Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00,4.86it/s] Validation on batches... 100%|████████████████████████████████|782/782 [02:13<00:00,5.88it/s] train_loss: 0.29504 - val_loss: 0.43493 - train_acc: 0.87352 -valid_acc: 0.82360 Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00, 4.87it/s] Validation on batches... 100%|████████████████████████████████|782/782[01:43<00:00,7.58it/s] train_loss: 0.16901 - val_loss: 0.48433 - train_acc: 0.93544 -valid_acc: 0.82624 Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00, 4.87it/s] Validation on batches... 100%|████████████████████████████████|782/782[00:46<00:00,16.79it/s]
train_loss: 0.09816 - val_loss: 0.73001 - train_acc: 0.96936 - valid_acc: 0.82144

It looks like a little over one epoch is enough training for this model and dataset.

Evaluate

When dealing with classification it’s useful to look at precision, recall and f1 score. Another good thing to look at when evaluating the model is the confusion matrix.

# Get prediction form model on validation data. This is where you should use
# your test data.
true_labels, predictions_labels, avg_epoch_loss = validation(valid_dataloader, device) # Create the evaluation report.
evaluation_report = classification_report(true_labels, predictions_labels, labels=list(labels_ids.values()), target_names=list(labels_ids.keys()))
# Show the evaluation report.
print(evaluation_report) # Plot confusion matrix.
plot_confusion_matrix(y_true=true_labels, y_pred=predictions_labels, classes=list(labels_ids.keys()), normalize=True, magnify=3, );
Outputs: 100%|████████████████████████████████|782/782[00:46<00:00,16.77it/s]
precision recall f1-score support neg 0.83 0.81 0.82 12500 pos 0.81 0.83 0.82 12500 accuracy 0.82 25000 macro avg 0.82 0.82 0.82 25000 weighted avg 0.82 0.82 0.82 25000 

Results are not great, but for this tutorial we are not interested in performance.

Final Note

If you made it this far Congrats! ? and Thank you! ? for your interest in my tutorial!

I’ve been using this code for a while now and I feel it got to a point where is nicely documented and easy to follow.

Of course is easy for me to follow because I built it. That is why any feedback is welcome and it helps me improve my future tutorials!

If you see something wrong please let me know by opening an issue on my ml_things GitHub repository!

A lot of tutorials out there are mostly a one-time thing and are not being maintained. I plan on keeping my tutorials up to date as much as I can.

This article was originally published on George Mihaila’s personal website and re-published to TOPBOTS with permission from the author.

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more technical education.

Source: https://www.topbots.com/fine-tune-transformers-in-pytorch/

Time Stamp:

More from TOPBOTS