This package provides a way to easily optimize generic performance metrics in supervised learning settings using the Adversarial Prediction framework. The method can be integrated easily into differentiable learning pipelines. The package is a Python implementation of an AISTATS 2020 paper, "AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning", by Rizal Fathony and Zico Kolter. For the original Julia implementation, please check AdversarialPrediction.jl.
ap_perf
enables easy integration of generic performance metrics (including non-decomposable metrics) into our differentiable learning pipeline. It currently supports performance metrics that are defined over binary classification problems.
Below is a code example for incorporating the F-2 score metric into a convolutional neural network training pipeline of PyTorch.
import torch
import torch.nn as nn
from ap_perf import PerformanceMetric, MetricLayer
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(30, 100)
self.fc2 = nn.Linear(100, 100)
self.fc3 = nn.Linear(100, 1)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x.squeeze()
# metric definition
class Fbeta(PerformanceMetric):
def __init__(self, beta):
self.beta = beta
def define(self, C):
return ((1 + self.beta ** 2) * C.tp) / ((self.beta ** 2) * C.ap + C.pp)
# initialize metric
f2_score = Fbeta(2)
f2_score.initialize()
f2_score.enforce_special_case_positive()
# create a model and criterion layer
model = Net().to(device)
criterion = MetricLayer(f2_score).to(device)
# forward pass
optimizer.zero_grad()
output = model(inputs)
objective = criterion(output, labels)
# backward pass
objective.backward()
optimizer.step()
As we can see from the code above, we can just write a Fbeta
class that inherits PerformanceMetric
class that contains the formula the F-beta score calculated from the entities in the confusion matrix. The PerformanceMetric
class provides all the computations of the adversarial prediction formulation for a given metric. We then incorporate it into our learning pipeline using MetricLayer
class (which inherits PyTorch's nn.Module
). The MetricLayer
serves as the objective of the optimization, replacing the standard nn.BCEWithLogitsLoss
class.
# Standard training with binary cross-entropy loss.
model = Net().to(device)
criterion = nn.BCEWithLogitsLoss().to(device)
# forward pass
optimizer.zero_grad()
output = model(inputs)
objective = criterion(output, labels)
# backward pass
objective.backward()
optimizer.step()
Note that the equation for F-beta in general is:
ap_perf
can be installed using pip
:
pip install ap_perf
Some pre-requisite packages will be installed automatically: numpy
, scipy
, and numba
.
It also requires PyTorch
version 1.x
.
Different tasks in machine learning require different metrics that align well with the tasks. For binary classification problems, many of the commonly used performance metrics are derived from the confusion matrix. A confusion matrix is a table that reports the values that relate the prediction of a classifier with the ground truth labels. The table below shows the anatomy of a confusion matrix.
Some of the metrics are decomposable, which means that it can be broken down to an independent sum of another metric that depends only on a single sample. However, most of the interesting performance metrics are non-decomposable, where we need to consider all samples at once. There are a wide variety of non-decomposable performance metrics, for example:
ap_perf
supports a family of performance metrics that can be expressed as a sum of fractions:
where aj and bj are constants, whereas fj and gj are functions over PP and AP. Hence, the numerator is a linear function over true positive (TP) and true negative (TN) which may also depends on sum statistics, i.e., predicted and actual positive (PP and AP) as well as their negative counterparts (predicted and actual negative (PN and AN)) and all data (ALL). Note that PN, AN, and ALL can be derived form PP and AP since ALL is just a constant, PN = ALL - PP, and AN = ALL - AP. The denominator depends only on the sum statistics (PP, AP, PN, AN, and ALL). This construction of performance metrics covers a vast range of commonly used metrics, including all metrics in the table above.
A performance metric can be defined in ap_perf
by inheriting the PerformanceMetric
class. Inside the class, we also need to write the definition of the metric by implementing a method that depends on the type of the metric and the confusion matrix: def define(self, C)
. Here, C
contains all entities in the confusion matrix. Below is an example of the F-1 score metric definition.
class F1score(PerformanceMetric):
def define(self, C):
return (2 * C.tp) / (C.ap + C.pp)
Some performance metrics (e.g., precision, recall, F-score, sensitivity, and specificity) enforce special cases to avoid division by zero. For the metrics that contain true positive, the special case is usually defined when the prediction or the true label for every sample are all zero. In this case, the metric is usually defined as 1 if both the prediction and the true label are all zero; otherwise, the metric is 0. For the metrics that contain true negative, similar cases occur, but with the prediction or the true label for every sample are all one. Therefore, when instantiating a metric, we need to take into account these special cases, for example, in the case of F-1 score:
f1_score = F1score()
f1_score.initialize()
f1_score.enforce_special_case_positive()
For some performance metrics, we may want to define a parametric metric. For example, the F-beta score, which depends on the value of beta. In this case, we can store the parameter as a field in our metric class, which then be used in the computation of the metric inside the define
method. For the case of the F-beta score metric, the code is:
class Fbeta(PerformanceMetric):
def __init__(self, beta):
self.beta = beta
def define(self, C):
return ((1 + self.beta ** 2) * C.tp) / ((self.beta ** 2) * C.ap + C.pp)
f1_score = Fbeta(1)
f1_score.initialize()
f1_score.enforce_special_case_positive()
f2_score = Fbeta(2)
f2_score.initialize()
f2_score.enforce_special_case_positive()
We can define arbitrary complex performance metrics inside the define
method, so long as it follows the construction of metrics that the package support. We can also use intermediate variables to store partial expression of the metric. Below is a code example for Cohen's kappa score.
class Kappa(PerformanceMetric):
def define(self, C):
dmin = (C.ap * C.pp + C.an * C.pn) / C.all**2
num = ((C.tp + C.tn) / C.all) - dmin
den = 1 - dmin
return num / den
kappa = Kappa()
kappa.initialize()
kappa.enforce_special_case_positive()
kappa.enforce_special_case_negative()
In some machine learning settings, we may want to optimize a performance metric subject to constraints on other metrics. This occurs in the case where there are trade-offs between different performance metrics. For example, a machine learning system may want to optimize the precision of the prediction; subject to its recall is greater than some threshold. We can define the constraints in the metric by implementing the method: def constraint(self, C)
. The code format for the constraints is metric >= th
, where th
is a real-valued threshold. Below is an example:
# Precision given recall metric
class PrecisionGvRecall(PerformanceMetric):
def __init__(self, th):
self.th = th
def define(self, C):
return C.tp / C.pp
def constraint(self, C):
return C.tp / C.ap >= self.th
precision_gv_recall_80 = PrecisionGvRecall(0.8)
precision_gv_recall_80.initialize()
precision_gv_recall_80.enforce_special_case_positive()
precision_gv_recall_80.set_cs_special_case_positive(True)
precision_gv_recall_60 = PrecisionGvRecall(0.6)
precision_gv_recall_60.initialize()
precision_gv_recall_60.enforce_special_case_positive()
precision_gv_recall_60.set_cs_special_case_positive(True)
Note that the method enforce_special_case_positive
enforces special cases for the precision metric, whereas set_cs_special_case_positive
enforces special cases for the metric in the constraint (recall metric).
We can also have two or more metrics in the constraints, for example:
# Precision given recall >= th1 and specificity >= th2
class PrecisionGvRecallSpecificity(PerformanceMetric):
def __init__(self, th1, th2):
self.th1 = th1
self.th2 = th2
def define(self, C):
return C.tp / C.pp
def constraint(self, C):
return [C.tp / C.ap >= self.th1,
C.tn / C.an >= self.th2]
precision_gv_recall_spec = PrecisionGvRecallSpecificity(0.8, 0.8)
precision_gv_recall_spec.initialize()
precision_gv_recall_spec.enforce_special_case_positive()
precision_gv_recall_spec.set_cs_special_case_positive([True, False])
precision_gv_recall_spec.set_cs_special_case_negative([False, True])
Here, we need to provide an array of boolean for the set_cs_special_case_positive
and set_cs_special_case_negative
methods.
Given we have a prediction for each sample yhat
and the true label y
, we can call the method compute_metric
to compute the value of the metric. Both yhat
and y
are vectors containing 0 or 1.
>>> f1_score.compute_metric(yhat, y)
0.8
For a metric with constraints, we can call the method compute_constraints
to compute the value of every metric in the constraints. For example:
>>> precision_gv_recall_spec.compute_constraints(yhat, y)
array([0.6, 0.6])
As we can see from the first code example, we can use the MetricLayer
class to incorporate the metrics we define into PyTorch differentiable learning pipeline. This class inherits the standard PyTorch nn.Module
.
The class provides forward and backward computation, which returns the objective and gradient information from the adversarial prediction formulation. The gradient is then be propagated to the previous layers.
The MetricLayer
serves as the objective of the optimization, replacing the standard nn.BCEWithLogitsLoss
class.
output = model(inputs)
criterion = nn.BCEWithLogitsLoss().to(device)
objective = criterion(output, labels)
objective.backward()
We can easily replace the existing codes that use binary cross-entropy by simply change the criterion to MetricLayer
, i.e.:
output = model(inputs)
criterion = MetricLayer(f2_score).to(device)
objective = criterion(output, labels)
objective.backward()
For solving the inner optimization problem, ap_perf
uses an ADMM based formulation. In the default setting, it will run 100 iterations of the ADMM optimization. We can also manually set the number of iteration using max_iter
argument by storing it inside a dictionary and pass it to the MetricLayer
class.
solver_params = {'max_iter' : 50}
criterion = MetricLayer(f2_score, solver_params).to(device)
The adversarial prediction formulation insidethe criterion objective needs to solve a maximin problem with a quadratic size of variables using the ADMM solver. The complexity of solving the problem is O(m^3), where m is the number of samples in a minibatch. In practice, for a batch size of 25, the ADMM solver takes around 50 - 60 milliseconds to solve on a PC with an Intel Core i7 processor. If we reduce the ADMM iterations to 50 iterations, it will take around 25 - 30 milliseconds.
Note: The Julia implementation of AP-Perf is roughly twice as fast as this Python implementation. please check AdversarialPrediction.jl for the Julia implementation.
The package provides definitions of commonly use metrics including: f1_score
, f2_score
, gpr
, mcc
, kappa
, etc. To load the metrics to the current Julia environment, please use ap_perf.common_metrics
. Please check the detailed definition of the metrics in ap_perf/common_metrics.py
.
from ap_perf import PerformanceMetric, MetricLayer
from ap_perf.common_metrics import f1_score, kappa
criterion = MetricLayer(kappa).to(device)
This repository contains a usage example in tabular dataset. tabular.py
contains an example on classication with tabular data.
Please also visit AP-examples repository for more examples on the experiment presented in the paper (coded in Julia).
Please cite the following paper if you use ap_perf
for your research.
@article{ap-perf,
title={AP-Perf: Incorporating Generic Performance Metrics in Differentiable Learning},
author={Fathony, Rizal and Kolter, Zico},
journal={arXiv preprint arXiv:1912.00965},
year={2019}
}
This project is supported by a grant from the Bosch Center for Artificial Intelligence.
This project is not possible without previous foundational research in Adversarial Prediction by Prof. Brian Ziebart's and Prof. Xinhua Zhang's research groups at the University of Illinois at Chicago.