metrics
Module¶
This module contains a bunch of evaluation metrics that can be used to evaluate the performance of learners.
author:  Michael Heilman (mheilman@ets.org) 

author:  Nitin Madnani (nmadnani@ets.org) 
author:  Dan Blanchard (dblanchard@ets.org) 
organization:  ETS 

skll.metrics.
f1_score_least_frequent
(y_true, y_pred)[source]¶ Calculate the F1 score of the least frequent label/class in
y_true
fory_pred
.Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
Returns: F1 score of the least frequent label

skll.metrics.
kappa
(y_true, y_pred, weights=None, allow_off_by_one=False)[source]¶ Calculates the kappa interrater agreement between two the gold standard and the predicted ratings. Potential values range from 1 (representing complete disagreement) to 1 (representing complete agreement). A kappa value of 0 is expected if all agreement is due to chance.
In the course of calculating kappa, all items in y_true and y_pred will first be converted to floats and then rounded to integers.
It is assumed that y_true and y_pred contain the complete range of possible ratings.
This function contains a combination of code from yorchopolis’s kappastats and Ben Hamner’s Metrics projects on Github.
Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
 weights (str or numpy array) –
Specifies the weight matrix for the calculation. Options are:
 None = unweightedkappa
 ‘quadratic’ = quadraticweighted kappa
 ‘linear’ = linearweighted kappa
 twodimensional numpy array = a custom matrix of weights. Each weight corresponds to the \(w_{ij}\) values in the wikipedia description of how to calculate weighted Cohen’s kappa.
 allow_off_by_one (bool) – If true, ratings that are off by one are counted as equal, and all other differences are reduced by one. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix.

skll.metrics.
kendall_tau
(y_true, y_pred)[source]¶ Calculate Kendall’s tau between
y_true
andy_pred
.Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
Returns: Kendall’s tau if welldefined, else 0

skll.metrics.
pearson
(y_true, y_pred)[source]¶ Calculate Pearson productmoment correlation coefficient between
y_true
andy_pred
.Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
Returns: Pearson productmoment correlation coefficient if welldefined, else 0

skll.metrics.
spearman
(y_true, y_pred)[source]¶ Calculate Spearman’s rank correlation coefficient between
y_true
andy_pred
.Parameters:  y_true (arraylike of float) – The true/actual/gold labels for the data.
 y_pred (arraylike of float) – The predicted/observed labels for the data.
Returns: Spearman’s rank correlation coefficient if welldefined, else 0

skll.metrics.
use_score_func
(func_name, y_true, y_pred)[source]¶ Call the scoring function in sklearn.metrics.SCORERS with the given name. This takes care of handling keyword arguments that were prespecified when creating the scorer. This applies any signflipping that was specified by make_scorer when the scorer was created.