# metrics Module¶

Metrics that can be used to evaluate the performance of learners.

skll.metrics.correlation(y_true, y_pred, corr_type='pearson')[source]

Calculate given correlation between y_true and y_pred. y_pred can be multi-dimensional. If y_pred is 1-dimensional, it may either contain probabilities, most-likely classification labels, or regressor predictions. In that case, we simply return the correlation between y_true and y_pred. If y_pred is multi-dimensional, it contains probabilties for multiple classes in which case, we infer the most likely labels and then compute the correlation between those and y_true.

Parameters: y_true (array-like of float) – The true/actual/gold labels for the data. y_pred (array-like of float) – The predicted/observed labels for the data. corr_type (str, optional) – Which type of correlation to compute. Possible choices are pearson, spearman, and kendall_tau. Defaults to pearson. ret_score – correlation value if well-defined, else 0.0 float
skll.metrics.f1_score_least_frequent(y_true, y_pred)[source]

Calculate the F1 score of the least frequent label/class in y_true for y_pred.

Parameters: y_true (array-like of float) – The true/actual/gold labels for the data. y_pred (array-like of float) – The predicted/observed labels for the data. ret_score – F1 score of the least frequent label. float
skll.metrics.kappa(y_true, y_pred, weights=None, allow_off_by_one=False)[source]

Calculates the kappa inter-rater agreement between two the gold standard and the predicted ratings. Potential values range from -1 (representing complete disagreement) to 1 (representing complete agreement). A kappa value of 0 is expected if all agreement is due to chance.

In the course of calculating kappa, all items in y_true and y_pred will first be converted to floats and then rounded to integers.

It is assumed that y_true and y_pred contain the complete range of possible ratings.

This function contains a combination of code from yorchopolis’s kappa-stats and Ben Hamner’s Metrics projects on Github.

Parameters: y_true (array-like of float) – The true/actual/gold labels for the data. y_pred (array-like of float) – The predicted/observed labels for the data. weights (str or np.array, optional) – Specifies the weight matrix for the calculation. Options are - None = unweighted-kappa - 'quadratic' = quadratic-weighted kappa - 'linear' = linear-weighted kappa - two-dimensional numpy array = a custom matrix of  weights. Each weight corresponds to the $$w_{ij}$$ values in the wikipedia description of how to calculate weighted Cohen’s kappa. Defaults to None. allow_off_by_one (bool, optional) – If true, ratings that are off by one are counted as equal, and all other differences are reduced by one. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix. Defaults to False. k – The kappa score, or weighted kappa score. float AssertionError – If y_true != y_pred. ValueError – If labels cannot be converted to int. ValueError – If invalid weight scheme.
skll.metrics.register_custom_metric(custom_metric_path, custom_metric_name)[source]

Import, load, and register the custom metric function from the given path.

Parameters: custom_metric_path (str) – The path to a custom metric. custom_metric_name (str) – The name of the custom metric function to load. This function must take only two array-like arguments: the true labels and the predictions, in that order. ValueError – If the custom metric path does not end in ‘.py’. NameError – If the name of the custom metric file conflicts with an already existing attribute in skll.metrics or if the custom metric name conflicts with a scikit-learn or SKLL metric.
skll.metrics.use_score_func(func_name, y_true, y_pred)[source]

Call the scoring function in sklearn.metrics.SCORERS with the given name. This takes care of handling keyword arguments that were pre-specified when creating the scorer. This applies any sign-flipping that was specified by make_scorer() when the scorer was created.

Parameters: func_name (str) – The name of the objective function to use from SCORERS. y_true (array-like of float) – The true/actual/gold labels for the data. y_pred (array-like of float) – The predicted/observed labels for the data. ret_score – The scored result from the given scorer. float