metrics
Module
Metrics that can be used to evaluate the performance of learners.
- author:
Nitin Madnani (nmadnani@ets.org)
- author:
Michael Heilman (mheilman@ets.org)
- author:
Dan Blanchard (dblanchard@ets.org)
- organization:
ETS
- skll.metrics.correlation(y_true, y_pred, corr_type='pearson')[source]
Calculate given correlation type between
y_true
andy_pred
.y_pred
can be multi-dimensional. Ify_pred
is 1-dimensional, it may either contain probabilities, most-likely classification labels, or regressor predictions. In that case, we simply return the correlation betweeny_true
andy_pred
. Ify_pred
is multi-dimensional, it contains probabilties for multiple classes in which case, we infer the most likely labels and then compute the correlation between those andy_true
.- Parameters:
y_true (numpy.ndarray) – The true/actual/gold labels for the data.
y_pred (numpy.ndarray) – The predicted/observed labels for the data.
corr_type (str, default="pearson") – Which type of correlation to compute. Possible choices are “pearson”, “spearman”, and “kendall_tau”.
- Returns:
correlation value if well-defined, else 0.0
- Return type:
- skll.metrics.f1_score_least_frequent(y_true, y_pred)[source]
Calculate F1 score of the least frequent label/class.
- Parameters:
y_true (numpy.ndarray) – The true/actual/gold labels for the data.
y_pred (numpy.ndarray) – The predicted/observed labels for the data.
- Returns:
F1 score of the least frequent label.
- Return type:
- skll.metrics.kappa(y_true, y_pred, weights=None, allow_off_by_one=False)[source]
Calculate the kappa inter-rater agreement.
The agreement is calculated between the gold standard and the predicted ratings. Potential values range from -1 (representing complete disagreement) to 1 (representing complete agreement). A kappa value of 0 is expected if all agreement is due to chance.
In the course of calculating kappa, all items in
y_true
andy_pred
will first be converted to floats and then rounded to integers.It is assumed that y_true and y_pred contain the complete range of possible ratings.
This function contains a combination of code from yorchopolis’s kappa-stats and Ben Hamner’s Metrics projects on Github.
- Parameters:
y_true (numpy.ndarray) – The true/actual/gold labels for the data.
y_pred (numpy.ndarray) – The predicted/observed labels for the data.
weights (Optional[Union[str, numpy.ndarray]], default=None) – Specifies the weight matrix for the calculation. Possible values are:
None
(unweighted-kappa),"quadratic"
(quadratically weighted kappa),"linear"
(linearly weighted kappa), and a two-dimensional numpy array (a custom matrix of weights). Each weight in this array corresponds to the \(w_{ij}\) values in the Wikipedia description of how to calculate weighted Cohen’s kappa.allow_off_by_one (bool, default=False) – If true, ratings that are off by one are counted as equal, and all other differences are reduced by one. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix.
- Returns:
The weighted or unweighted kappa score.
- Return type:
- Raises:
AssertionError – If
y_true
!=y_pred
.ValueError – If labels cannot be converted to int.
ValueError – If invalid weight scheme.
- skll.metrics.register_custom_metric(custom_metric_path, custom_metric_name)[source]
Import, load, and register the custom metric function from the given path.
- Parameters:
custom_metric_path (
skll.types.PathOrStr
) – The path to a custom metric.custom_metric_name (str) – The name of the custom metric function to load. This function must take only two array-like arguments: the true labels and the predictions, in that order.
- Raises:
ValueError – If the custom metric path does not end in ‘.py’.
NameError – If the name of the custom metric file conflicts with an already existing attribute in
skll.metrics
or if the custom metric name conflicts with a scikit-learn or SKLL metric.
- skll.metrics.use_score_func(func_name, y_true, y_pred)[source]
Call the given scoring function.
This takes care of handling keyword arguments that were pre-specified when creating the scorer. This applies any sign-flipping that was specified by
make_scorer()
when the scorer was created.- Parameters:
func_name (str) – The name of the objective function to use.
y_true (numpy.ndarray) – The true/actual/gold labels for the data.
y_pred (numpy.ndarray) – The predicted/observed labels for the data.
- Returns:
The scored result from the given scorer.
- Return type: