metrics Module

This module contains a bunch of evaluation metrics that can be used to evaluate the performance of learners.

author:Michael Heilman (mheilman@ets.org)
author:Nitin Madnani (nmadnani@ets.org)
author:Dan Blanchard (dblanchard@ets.org)
organization:ETS
skll.metrics.f1_score_least_frequent(y_true, y_pred)[source]

Calculate the F1 score of the least frequent label/class in y_true for y_pred.

Parameters:
  • y_true (array-like of float) – The true/actual/gold labels for the data.
  • y_pred (array-like of float) – The predicted/observed labels for the data.
Returns:

F1 score of the least frequent label

skll.metrics.kappa(y_true, y_pred, weights=None, allow_off_by_one=False)[source]

Calculates the kappa inter-rater agreement between two the gold standard and the predicted ratings. Potential values range from -1 (representing complete disagreement) to 1 (representing complete agreement). A kappa value of 0 is expected if all agreement is due to chance.

In the course of calculating kappa, all items in y_true and y_pred will first be converted to floats and then rounded to integers.

It is assumed that y_true and y_pred contain the complete range of possible ratings.

This function contains a combination of code from yorchopolis’s kappa-stats and Ben Hamner’s Metrics projects on Github.

Parameters:
  • y_true (array-like of float) – The true/actual/gold labels for the data.
  • y_pred (array-like of float) – The predicted/observed labels for the data.
  • weights (str or numpy array) –

    Specifies the weight matrix for the calculation. Options are:

    • None = unweighted-kappa
    • ‘quadratic’ = quadratic-weighted kappa
    • ‘linear’ = linear-weighted kappa
    • two-dimensional numpy array = a custom matrix of weights. Each weight corresponds to the \(w_{ij}\) values in the wikipedia description of how to calculate weighted Cohen’s kappa.
  • allow_off_by_one (bool) – If true, ratings that are off by one are counted as equal, and all other differences are reduced by one. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix.
skll.metrics.kendall_tau(y_true, y_pred)[source]

Calculate Kendall’s tau between y_true and y_pred.

Parameters:
  • y_true (array-like of float) – The true/actual/gold labels for the data.
  • y_pred (array-like of float) – The predicted/observed labels for the data.
Returns:

Kendall’s tau if well-defined, else 0

skll.metrics.pearson(y_true, y_pred)[source]

Calculate Pearson product-moment correlation coefficient between y_true and y_pred.

Parameters:
  • y_true (array-like of float) – The true/actual/gold labels for the data.
  • y_pred (array-like of float) – The predicted/observed labels for the data.
Returns:

Pearson product-moment correlation coefficient if well-defined, else 0

skll.metrics.spearman(y_true, y_pred)[source]

Calculate Spearman’s rank correlation coefficient between y_true and y_pred.

Parameters:
  • y_true (array-like of float) – The true/actual/gold labels for the data.
  • y_pred (array-like of float) – The predicted/observed labels for the data.
Returns:

Spearman’s rank correlation coefficient if well-defined, else 0

skll.metrics.use_score_func(func_name, y_true, y_pred)[source]

Call the scoring function in sklearn.metrics.SCORERS with the given name. This takes care of handling keyword arguments that were pre-specified when creating the scorer. This applies any sign-flipping that was specified by make_scorer when the scorer was created.