Using Custom Metrics

Although SKLL comes with a huge number of built-in metrics for both classification and regression, there might be occasions when you want to use a custom metric function for hyper-parameter tuning or for evaluation. This section shows you how to do that.

Writing Custom Metric Functions

First, let’s look at how to write valid custom metric functions. A valid custom metric function must take two array-like positional arguments: the first being the true labels or scores, and the second being the predicted labels or scores. This function can also take two optional keyword arguments:

greater_is_better: a boolean keyword argument that indicates whether a higher value of the metric indicates better performance (True) or vice versa (False). The default value is True.
response_method : a string keyword argument that specifies the response method to use to get predictions from an estimator. Possible values are:
- "predict" : uses estimator’s predict() method to get class labels
- "predict_proba" : uses estimator’s predict_proba() method to get class probabilities
- "decision_function" : uses estimator’s decision_function() method to get continuous decision function values
- If the value is a list or tuple of the above strings, it indicates that the scorer should use the first method in the list which is implemented by the estimator.
- If the value is None, it is the same as "predict".
The default value for response_method is None.

Note that these keyword arguments are identical to the keyword arguments for the sklearn.metrics.make_scorer() function and serve the same purpose.

Important

Previous versions of SKLL offered the needs_proba and needs_threshold keyword arguments for custom metrics but these are now deprecated in scikit-learn and replaced by the response_method keyword argument. To replicate the behavior of needs_proba=True, use response_method="predict_proba" instead and to replicate needs_threshold=True, use response_method=("decision_function", "predict_proba") instead.

In short, custom metric functions take two required positional arguments (order matters) and two optional keyword arguments. Here’s a simple example of a custom metric function: F_β with β=0.75 defined in a file called custom.py.

custom.py

from sklearn.metrics import fbeta_score

def f075(y_true, y_pred):
    return fbeta_score(y_true, y_pred, beta=0.75)

Obviously, you may write much more complex functions that aren’t directly available in scikit-learn. Once you have written your metric function, the next step is to use it in your SKLL experiment.

Using in Configuration Files

The first way of using custom metric functions is via your SKLL experiment configuration file if you are running SKLL via the command line. To do so:

Add a field called custom_metric_path in the Input section of your configuration file and set its value to be the path to the .py file containing your custom metric function.
Add the name of your custom metric function to either the objectives field in the Tuning section (if you wish to use it to tune the model hyper-parameters) or to the metrics field in the Output section if you wish to only use it for evaluation. You can also add it to both.

Here’s an example configuration file using data from the SKLL Titanic example that illustrates this. This file assumes that the file custom.py above is located in the same directory.

[General]
experiment_name = titanic
task = evaluate

[Input]
train_directory = train
test_directory = dev
featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]]
learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"]
label_col = Survived
id_col = PassengerId
custom_metric_path = custom.py

[Tuning]
grid_search = true
objectives = ['f075']

[Output]
metrics = ['roc_auc']
probability = true
logs = output
results = output
predictions = output
models = output

And that’s it! SKLL will dynamically load and use your custom metric function when you run your experiment. Custom metric functions can be used for both hyper-parameter tuning and for evaluation.

Using via the API

To use a custom metric function via the SKLL API, you first need to register the custom metric function using the register_custom_metric() function and then just use the metric name either as a grid search objective, an output metric, or both.

Here’s a short example that shows how to use the f075() custom metric function we defined above via the SKLL API. Again, we assume that custom.py is located in the current directory.

from skll.data import CSVReader
from skll.learner import Learner
from skll.metrics import register_custom_metric

# register the custom function with SKLL
_ = register_custom_metric("custom.py", "f075")

# let's assume the training data lives in a file called "train.csv"
# we load that into a SKLL FeatureSet
fs = CSVReader.for_path("train.csv").read()

# instantiate a learner and tune its parameters using the custom metric
learner = Learner('LogisticRegression')
learner.train(fs, grid_objective="f075")

...

As with configuration files, custom metric functions can be used for both training as well as evaluation with the API.

Important

When using the API, if you have multiple metric functions defined in a Python source file, you must register each one individually using register_custom_metric().
When using the API, if you try to re-register the same metric in the same Python session, it will raise a NameError. Therefore, if you edit your custom metric, you must start a new Python session to be able to see the changes.
When using the API, if the names of any of your custom metric functions conflict with names of metrics that already exist in either SKLL or scikit-learn, it will raise a NameError. You should rename the metric function in that case.
When using a configuration file, if your custom metric name conflicts with names of metrics that already exist in either SKLL or scikit-learn, it will be silently ignored in favor of the already existing metric.
Unlike for the built-in metrics, SKLL does not check whether your custom metric function is appropriate for classification or regression. You must make that decision for yourself.