使用拥抱面部变压器在 PyTorch 中微调变压器

由柏拉图重新发布

关注： 0

微调变压器

此笔记本旨在使用预训练的变压器模型并在分类任务上对其进行微调。本教程的重点将放在代码本身以及如何根据您的需要进行调整。

本笔记本使用 自动类 止 变压器 by 拥抱脸 功能。这个功能可以通过传入模型的名称来猜测模型的配置、标记器和架构。 这允许在大量转换器模型上进行代码重用！

如果这些深入的教育内容对您有用，则可以订阅我们的AI研究邮件列表当我们发布新材料时被提醒。

我应该为这本笔记本知道些什么？

我提供了足够的说明和评论，以便能够遵循最少的 Python 编码知识。

由于我使用 PyTorch 来微调我们的转换器模型，因此任何有关 PyTorch 的知识都非常有用。稍微了解一下 变形金刚 图书馆也有帮助。

如何使用此笔记本？

我构建这个笔记本时考虑到了可重用性。我将数据集加载到 PyTorch 中的方式 数据集 类是非常标准的，可以很容易地用于任何其他数据集。

使用您自己的数据集所需的唯一修改是读取 电影评论数据集 使用 PyTorch 的类 数据集。该 数据加载器 将返回一个字典批量输入格式，以便可以使用以下语句将其直接馈送到模型： outputs = model(**batch). 只要这条语句成立，其余的代码就可以工作！

哪些变压器型号适用于这款笔记本电脑？

在处理文本数据分类时，我使用与 Bert 不同的模型的情况很少见。当需要运行不同的变压器模型架构时，哪个可以使用此代码？

由于笔记本的名称是 微调变压器 它应该与不止一种类型的变压器一起使用。

我跑过这个笔记本 所有预训练模型 在拥抱脸变压器上找到。这样您就可以提前知道您计划使用的模型是否可以在没有任何修改的情况下使用此代码。

可以找到与此笔记本一起使用的预训练变压器模型列表此处。有 73 有效的模型？ 和 33 种型号无法工作？ 有了这个笔记本。

数据集

本笔记本将涵盖用于二进制分类任务的微调转换器。我将使用著名的电影评论正面 - 负面标记 大电影评论数据集.

斯坦福大学网站上提供的说明：

这是一个用于二进制情感分类的数据集，它包含的数据比以前的基准数据集要多得多。我们提供了25,000套极地电影评论供培训，而25,000套则用于测试。也有其他未标记的数据可供使用。提供原始文本和已处理的单词格式袋。有关更多详细信息，请参见发行版中包含的自述文件。

为什么使用此数据集？ 我相信这是一个易于理解和使用的数据集进行分类。我认为，使用情感数据总是很有趣。

编码

现在让我们做一些编码！我们将遍历笔记本中的每个编码单元，并描述其功能，代码是什么以及何时相关—显示输出。

如果您决定在自己的python笔记本中运行每个代码单元，则我使这种格式易于遵循。

当我从教程中学习时，我总是尝试复制结果。我相信，如果在说明旁边有代码，则很容易遵循。

资料下载

下载 大电影评论数据集 并在本地解压缩。

# Download the dataset.
!wget -q -nc http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
# Unzip the dataset.
!tar -zxf /content/aclImdb_v1.tar.gz

安装

变形金刚 需要安装库才能使用Hugging Face的所有出色代码。要获取最新版本，我将直接从GitHub安装它。
ml_things 用于各种机器学习相关任务的库。我创建了这个库来减少我需要为每个机器学习项目编写的代码量。试试看！

# Install transformers library.
!pip install -q git+https://github.com/huggingface/transformers.git
# Install helper functions.
!pip install -q git+https://github.com/gmihaila/ml_things.git

Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done |████████████████████████████████| 2.9MB 6.7MB/s |████████████████████████████████| 890kB 48.9MB/s |████████████████████████████████| 1.1MB 49.0MB/s Building wheel for transformers (PEP 517) ... done
Building wheel for sacremoses (setup.py) ... done |████████████████████████████████| 71kB 5.2MB/s Building wheel for ml-things (setup.py) ... done
Building wheel for ftfy (setup.py) ... done

进口

导入此笔记本所需的所有库。

声明此笔记本使用的参数：

set_seed(123) –始终为固定重现性设置好种子。
epochs –训练时期的数量（作者建议2到4之间）。
batch_size – 批次数 – 取决于最大序列长度和 GPU 内存。对于 512 序列长度，一批 10 通常没有 cuda 内存问题。对于小序列长度可以尝试批量 32 或更高。
max_length – 将文本序列填充或截断到特定长度。我将其设置为 60 个令牌以加快训练速度。
device – 寻找要使用的 gpu。我会用 cpu 默认情况下如果没有 gpu 找到。
model_name_or_path – 变压器模型的名称 – 将使用已经预训练的模型。变压器模型的路径 - 将从本地磁盘加载您自己的模型。我总是喜欢从 bert-base-cased: 12层，768隐藏，12头，109M参数。受过英文文本的训练。
labels_ids –标签字典及其ID –将用于将字符串标签转换为数字。
n_labels –我们在此数据集中使用了多少个标签。这用于确定分类头的大小。

import io
import os
import torch
from tqdm.notebook import tqdm
from torch.utils.data import Dataset, DataLoader
from ml_things import plot_dict, plot_confusion_matrix, fix_text
from sklearn.metrics import classification_report, accuracy_score
from transformers import (AutoConfig, AutoModelForSequenceClassification, AutoTokenizer, AdamW, get_linear_schedule_with_warmup, set_seed, ) # Set seed for reproducibility,
set_seed(123) # Number of training epochs (authors recommend between 2 and 4)
epochs = 4 # Number of batches - depending on the max sequence length and GPU memory.
# For 512 sequence length batch of 10 works without cuda memory issues.
# For small sequence length can try batch of 32 or higher.
batches = 32 # Pad or truncate text sequences to a specific length
# if `None` it will use maximum sequence of word piece tokens allowed by model.
max_length = 60 # Look for gpu to use. Will use `cpu` by default if no gpu found.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Name of transformers model - will use already pretrained model.
# Path of transformer model - will load your own model from local disk.
model_name_or_path = 'bert-base-cased' # Dicitonary of labels and their id - this will be used to convert.
# String labels to number ids.
labels_ids = {'neg': 0, 'pos': 1} # How many labels are we using in training.
# This is used to decide size of classification head.
n_labels = len(labels_ids)
辅助功能
我喜欢在本节下保留我将在此笔记本中使用的所有类和函数，以帮助保持笔记本的整洁外观：
电影评论数据集
如果你在这之前使用过 PyTorch，那就太标准了。 我们需要这个类来读取我们的数据集，对其进行解析，使用将文本转换为数字的标记器，并将其转换为一个很好的格式以提供给模型。
有幸使用， 拥抱脸 想到了一切，并做出了 标记生成器 完成所有繁重的工作（将文本拆分为标记、填充、截断、将文本编码为数字）并且非常易于使用！
在这个类中，我只需要读入每个文件的内容，使用 修复文本 修复任何 Unicode 问题并跟踪积极和消极的情绪。
我将在列表中附加所有文本和标签，稍后我将提供给标记器和标签 ID，以将所有内容转换为数字。
这个 PyTorch 有三个主要部分 数据集 类：

在里面（） 我们在其中读取数据集并将文本和标签转换为数字。
__长度__() 我们需要返回我们读入的示例数。这在调用时使用 len（电影评论数据集（）） .
__getitem __（） 总是将一个 int 值作为输入，表示从我们的示例中要从我们的数据集中返回的哪个示例。 如果传递的值为 3，我们将在位置 3 从我们的数据集中返回示例。它需要返回一个对象，该对象的格式可以提供给我们的模型。 幸运的是，我们的分词器为我们做了这些，并返回了一个变量字典，准备好以这种方式输入模型：model(**inputs).

class MovieReviewsDataset(Dataset): r"""PyTorch Dataset class for loading data. This is where the data parsing happens and where the text gets encoded using loaded tokenizer. This class is built with reusability in mind: it can be used as is as long as the `dataloader` outputs a batch in dictionary format that can be passed straight into the model - `model(**batch)`. Arguments: path (:obj:`str`): Path to the data partition. use_tokenizer (:obj:`transformers.tokenization_?`): Transformer type tokenizer used to process raw text into numbers. labels_ids (:obj:`dict`): Dictionary to encode any labels names into numbers. Keys map to labels names and Values map to number associated to those labels. max_sequence_len (:obj:`int`, `optional`) Value to indicate the maximum desired sequence to truncate or pad text sequences. If no value is passed it will used maximum sequence size supported by the tokenizer and model. """ def __init__(self, path, use_tokenizer, labels_ids, max_sequence_len=None): # Check if path exists. if not os.path.isdir(path): # Raise error if path is invalid. raise ValueError('Invalid `path` variable! Needs to be a directory') # Check max sequence length. max_sequence_len = use_tokenizer.max_len if max_sequence_len is None else max_sequence_len texts = [] labels = [] print('Reading partitions...') # Since the labels are defined by folders with data we loop # through each label. for label, label_id, in tqdm(labels_ids.items()): sentiment_path = os.path.join(path, label) # Get all files from path. files_names = os.listdir(sentiment_path)#[:10] # Sample for debugging. print('Reading %s files...' % label) # Go through each file and read its content. for file_name in tqdm(files_names): file_path = os.path.join(sentiment_path, file_name) # Read content. content = io.open(file_path, mode='r', encoding='utf-8').read() # Fix any unicode issues. content = fix_text(content) # Save content. texts.append(content) # Save encode labels. labels.append(label_id) # Number of exmaples. self.n_examples = len(labels) # Use tokenizer on texts. This can take a while. print('Using tokenizer on all texts. This can take a while...') self.inputs = use_tokenizer(texts, add_special_tokens=True, truncation=True, padding=True, return_tensors='pt', max_length=max_sequence_len) # Get maximum sequence length. self.sequence_len = self.inputs['input_ids'].shape[-1] print('Texts padded or truncated to %d length!' % self.sequence_len) # Add labels. self.inputs.update({'labels':torch.tensor(labels)}) print('Finished!n') return def __len__(self): r"""When used `len` return the number of examples. """ return self.n_examples def __getitem__(self, item): r"""Given an index return an example from the position. Arguments: item (:obj:`int`): Index position to pick an example to return. Returns: :obj:`Dict[str, object]`: Dictionary of inputs that feed into the model. It holddes the statement `model(**Returned Dictionary)`. """ return {key: self.inputs[key][item] for key in self.inputs.keys()}
训练（数据加载器，优化程序_，调度程序_，设备_）
我创建了这个函数来执行一个完整的传递 数据加载器 对象（ 数据加载器 对象是从我们的 数据集 类型对象使用 电影评论数据集 班级）。 这基本上是通过整个数据集的一个 epoch 训练。
  数据加载器 是从 PyTorch 创建的 数据加载器 它采用从创建的对象 电影评论数据集 类并将每个示例分批放置。 这样我们就可以为我们的模型批量提供数据！
  优化器_ 和 调度器_ 在 PyTorch 中很常见。 他们需要更新我们模型的参数并在训练期间更新我们的学习率。 还有很多，但我不会详细介绍。 这实际上可能是一个巨大的兔子洞，因为在这些函数背后发生了很多我们不需要担心的事情。 谢谢 PyTorch！
在此过程中，我们会跟踪实际标签和预测标签以及损失。
def train(dataloader, optimizer_, scheduler_, device_): r""" Train pytorch model on a single pass through the data loader. It will use the global variable `model` which is the transformer model loaded on `_device` that we want to train on. This function is built with reusability in mind: it can be used as is as long as the `dataloader` outputs a batch in dictionary format that can be passed straight into the model - `model(**batch)`. Arguments: dataloader (:obj:`torch.utils.data.dataloader.DataLoader`): Parsed data into batches of tensors. optimizer_ (:obj:`transformers.optimization.AdamW`): Optimizer used for training. scheduler_ (:obj:`torch.optim.lr_scheduler.LambdaLR`): PyTorch scheduler. device_ (:obj:`torch.device`): Device used to load tensors before feeding to model. Returns: :obj:`List[List[int], List[int], float]`: List of [True Labels, Predicted Labels, Train Average Loss]. """ # Use global variable for model. global model # Tracking variables. predictions_labels = [] true_labels = [] # Total loss for this epoch. total_loss = 0 # Put the model into training mode. model.train() # For each batch of training data... for batch in tqdm(dataloader, total=len(dataloader)): # Add original labels - use later for evaluation. true_labels += batch['labels'].numpy().flatten().tolist() # move batch to device batch = {k:v.type(torch.long).to(device_) for k,v in batch.items()} # Always clear any previously calculated gradients before performing a # backward pass. model.zero_grad() # Perform a forward pass (evaluate the model on this training batch). # This will return the loss (rather than the model output) because we # have provided the `labels`. # The documentation for this a bert model function is here: # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification outputs = model(**batch) # The call to `model` always returns a tuple, so we need to pull the # loss value out of the tuple along with the logits. We will use logits # later to calculate training accuracy. loss, logits = outputs[:2] # Accumulate the training loss over all of the batches so that we can # calculate the average loss at the end. `loss` is a Tensor containing a # single value; the `.item()` function just returns the Python value # from the tensor. total_loss += loss.item() # Perform a backward pass to calculate the gradients. loss.backward() # Clip the norm of the gradients to 1.0. # This is to help prevent the "exploding gradients" problem. torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # Update parameters and take a step using the computed gradient. # The optimizer dictates the "update rule"--how the parameters are # modified based on their gradients, the learning rate, etc. optimizer.step() # Update the learning rate. scheduler.step() # Move logits and labels to CPU logits = logits.detach().cpu().numpy() # Convert these logits to list of predicted labels values. predictions_labels += logits.argmax(axis=-1).flatten().tolist() # Calculate the average loss over the training data. avg_epoch_loss = total_loss / len(dataloader) # Return all true labels and prediction for future evaluations. return true_labels, predictions_labels, avg_epoch_loss
验证（数据加载器，device_）
我以非常相似的方式实现了这个功能 培养 但没有参数更新、后向传递和梯度体面部分。 我们不需要做所有那些计算量非常大的任务，因为我们只关心模型的预测。
我用的是 数据加载器 以类似于在 train 中获取批次以提供给我们的模型的方式。
在此过程中，我会跟踪实际标签和预测标签以及损失。
def validation(dataloader, device_): r"""Validation function to evaluate model performance on a separate set of data. This function will return the true and predicted labels so we can use later to evaluate the model's performance. This function is built with reusability in mind: it can be used as is as long as the `dataloader` outputs a batch in dictionary format that can be passed straight into the model - `model(**batch)`. Arguments: dataloader (:obj:`torch.utils.data.dataloader.DataLoader`): Parsed data into batches of tensors. device_ (:obj:`torch.device`): Device used to load tensors before feeding to model. Returns: :obj:`List[List[int], List[int], float]`: List of [True Labels, Predicted Labels, Train Average Loss] """ # Use global variable for model. global model # Tracking variables predictions_labels = [] true_labels = [] #total loss for this epoch. total_loss = 0 # Put the model in evaluation mode--the dropout layers behave differently # during evaluation. model.eval() # Evaluate data for one epoch for batch in tqdm(dataloader, total=len(dataloader)): # add original labels true_labels += batch['labels'].numpy().flatten().tolist() # move batch to device batch = {k:v.type(torch.long).to(device_) for k,v in batch.items()} # Telling the model not to compute or store gradients, saving memory and # speeding up validation with torch.no_grad(): # Forward pass, calculate logit predictions. # This will return the logits rather than the loss because we have # not provided labels. # token_type_ids is the same as the "segment ids", which # differentiates sentence 1 and 2 in 2-sentence tasks. # The documentation for this `model` function is here: # https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification outputs = model(**batch) # The call to `model` always returns a tuple, so we need to pull the # loss value out of the tuple along with the logits. We will use logits # later to to calculate training accuracy. loss, logits = outputs[:2] # Move logits and labels to CPU logits = logits.detach().cpu().numpy() # Accumulate the training loss over all of the batches so that we can # calculate the average loss at the end. `loss` is a Tensor containing a # single value; the `.item()` function just returns the Python value # from the tensor. total_loss += loss.item() # get predicitons to list predict_content = logits.argmax(axis=-1).flatten().tolist() # update list predictions_labels += predict_content # Calculate the average loss over the training data. avg_epoch_loss = total_loss / len(dataloader) # Return all true labels and prediciton for future evaluations. return true_labels, predictions_labels, avg_epoch_loss
负载模型和分词器
加载预训练变压器的三个基本部分： 配置, 标记生成器 和 模型. 我还需要在我打算使用的设备（GPU / CPU）上加载模型。
由于我使用 自动分类 功能来自 拥抱脸 我只需要担心模型的名称作为输入，其余的由转换器库处理。
# Get model configuration.
print('Loading configuraiton...')
model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_name_or_path, num_labels=n_labels) # Get model's tokenizer.
print('Loading tokenizer...')
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path) # Get the actual model.
print('Loading model...')
model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=model_name_or_path, config=model_config) # Load model to defined device.
model.to(device)
print('Model loaded to `%s`'%device)
Loading configuraiton...
Loading tokenizer...
Loading model...
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Model loaded to `cuda`
数据集和数据加载器
这是我创建 PyTorch 的地方 数据集 和 数据加载器 将用于将数据输入模型的对象。
这是我使用的地方 电影评论数据集 类并创建数据集变量。 由于数据是针对训练和测试进行分区的，因此我将创建一个 PyTorch 数据集 和 PyTorch 数据加载器 训练和测试的对象。 仅为简单起见，我将使用测试作为验证。 在实践中，切勿使用测试数据进行验证！
print('Dealing with Train...')
# Create pytorch dataset.
train_dataset = MovieReviewsDataset(path='/content/aclImdb/train', use_tokenizer=tokenizer, labels_ids=labels_ids, max_sequence_len=max_length)
print('Created `train_dataset` with %d examples!'%len(train_dataset)) # Move pytorch dataset into dataloader.
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
print('Created `train_dataloader` with %d batches!'%len(train_dataloader)) print() print('Dealing with ...')
# Create pytorch dataset.
valid_dataset = MovieReviewsDataset(path='/content/aclImdb/test', use_tokenizer=tokenizer, labels_ids=labels_ids, max_sequence_len=max_length)
print('Created `valid_dataset` with %d examples!'%len(valid_dataset)) # Move pytorch dataset into dataloader.
valid_dataloader = DataLoader(valid_dataset, batch_size=batch_size, shuffle=False)
print('Created `eval_dataloader` with %d batches!'%len(valid_dataloader))
Dealing with Train...
Reading partitions...
100%|████████████████████████████████|2/2 [00:34<00:00, 17.28s/it]
Reading neg files...
100%|████████████████████████████████|12500/12500 [00:34<00:00, 362.01it/s] Reading pos files...
100%|████████████████████████████████|12500/12500 [00:23<00:00, 534.34it/s] Using tokenizer on all texts. This can take a while...
Texts padded or truncated to 40 length!
Finished! Created `train_dataset` with 25000 examples!
Created `train_dataloader` with 25000 batches! Dealing with ...
Reading partitions...
100%|████████████████████████████████|2/2 [01:28<00:00, 44.13s/it]
Reading neg files...
100%|████████████████████████████████|12500/12500 [01:28<00:00, 141.71it/s] Reading pos files...
100%|████████████████████████████████|12500/12500 [01:17<00:00, 161.60it/s] Using tokenizer on all texts. This can take a while...
Texts padded or truncated to 40 length!
Finished! Created `valid_dataset` with 25000 examples!
Created `eval_dataloader` with 25000 batches!
培训
我创建了一个优化器和调度器，PyTorch 将在训练中使用它们。
我遍历定义的时期数并调用 培养 和 验证 功能。
我将在每个 epoch 之后输出类似的信息，就像在 Keras 中一样： train_loss：— val_loss：— train_acc：— valid_acc.
训练后，我绘制了训练和验证损失以及准确度曲线，以检查训练的进展情况。
# Note: AdamW is a class from the huggingface library (as opposed to pytorch) # I believe the 'W' stands for 'Weight Decay fix"
optimizer = AdamW(model.parameters(), lr = 2e-5, # args.learning_rate - default is 5e-5, our notebook had 2e-5 eps = 1e-8 # args.adam_epsilon - default is 1e-8. ) # Total number of training steps is number of batches * number of epochs.
# `train_dataloader` contains batched data so `len(train_dataloader)` gives # us the number of batches.
total_steps = len(train_dataloader) * epochs # Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, # Default value in run_glue.py num_training_steps = total_steps) # Store the average loss after each epoch so we can plot them.
all_loss = {'train_loss':[], 'val_loss':[]}
all_acc = {'train_acc':[], 'val_acc':[]} # Loop through each epoch.
print('Epoch')
for epoch in tqdm(range(epochs)): print() print('Training on batches...') # Perform one full pass over the training set. train_labels, train_predict, train_loss = train(train_dataloader, optimizer, scheduler, device) train_acc = accuracy_score(train_labels, train_predict) # Get prediction form model on validation data. print('Validation on batches...') valid_labels, valid_predict, val_loss = validation(valid_dataloader, device) val_acc = accuracy_score(valid_labels, valid_predict) # Print loss and accuracy values to see how training evolves. print(" train_loss: %.5f - val_loss: %.5f - train_acc: %.5f - valid_acc: %.5f"%(train_loss, val_loss, train_acc, val_acc)) print() # Store the loss value for plotting the learning curve. all_loss['train_loss'].append(train_loss) all_loss['val_loss'].append(val_loss) all_acc['train_acc'].append(train_acc) all_acc['val_acc'].append(val_acc) # Plot loss curves.
plot_dict(all_loss, use_xlabel='Epochs', use_ylabel='Value', use_linestyles=['-', '--']) # Plot accuracy curves.
plot_dict(all_acc, use_xlabel='Epochs', use_ylabel='Value', use_linestyles=['-', '--'])
Epoch 100%|████████████████████████████████|4/4[13:49<00:00, 207.37s/it] Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00,4.86it/s] Validation on batches... 100%|████████████████████████████████|782/782[00:46<00:00,16.80it/s] train_loss: 0.44816 - val_loss: 0.38655 - train_acc: 0.78372 - valid_acc: 0.81892 Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00,4.86it/s] Validation on batches... 100%|████████████████████████████████|782/782 [02:13<00:00,5.88it/s] train_loss: 0.29504 - val_loss: 0.43493 - train_acc: 0.87352 -valid_acc: 0.82360 Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00, 4.87it/s] Validation on batches... 100%|████████████████████████████████|782/782[01:43<00:00,7.58it/s] train_loss: 0.16901 - val_loss: 0.48433 - train_acc: 0.93544 -valid_acc: 0.82624 Training on batches... 100%|████████████████████████████████|782/782[02:40<00:00, 4.87it/s] Validation on batches... 100%|████████████████████████████████|782/782[00:46<00:00,16.79it/s]
train_loss: 0.09816 - val_loss: 0.73001 - train_acc: 0.96936 - valid_acc: 0.82144






看起来对于这个模型和数据集来说，一个多一点的训练就足够了。
评估
在处理分类时，查看 精确， 记得 和 f1分数. 评估模型时要注意的另一件事是混淆矩阵。
# Get prediction form model on validation data. This is where you should use
# your test data.
true_labels, predictions_labels, avg_epoch_loss = validation(valid_dataloader, device) # Create the evaluation report.
evaluation_report = classification_report(true_labels, predictions_labels, labels=list(labels_ids.values()), target_names=list(labels_ids.keys()))
# Show the evaluation report.
print(evaluation_report) # Plot confusion matrix.
plot_confusion_matrix(y_true=true_labels, y_pred=predictions_labels, classes=list(labels_ids.keys()), normalize=True, magnify=3, );
Outputs: 100%|████████████████████████████████|782/782[00:46<00:00,16.77it/s]
precision recall f1-score support neg 0.83 0.81 0.82 12500 pos 0.81 0.83 0.82 12500 accuracy 0.82 25000 macro avg 0.82 0.82 0.82 25000 weighted avg 0.82 0.82 0.82 25000 



结果不是很好，但是对于本教程，我们对性能不感兴趣。
最后的最后
如果你走了这么远 恭喜！ ？ 和 谢谢你！ ？为了您对我的教程感兴趣！
我已经使用了一段时间了，我觉得它已经被很好地记录在案，并且易于理解。
我当然很容易理解，因为我建立了它。 这就是欢迎任何反馈的原因，它可以帮助我改善以后的教程！
如果您发现有问题，请通过打开一个 我的 ml_things 问题 GitHub存储库！
许多教程都是一次性的，没有得到维护。 我计划尽我所能使教程保持最新。
这篇文章最初发表于 乔治·米海拉（George Mihaila）的个人网站 并在获得作者许可的情况下重新发布到TOPBOTS。
喜欢这篇文章吗？ 注册以获取更多AI研究更新。
我们会在发布更多技术教育时通知您。
相关

 来源：https://www.topbots.com/fine-tune-transformers-in-pytorch/