Quickstart

Here is a quick run-down of how you accomplish common tasks.

Load a FeatureSet from a file:

from skll.data import Reader

example_reader = Reader.for_path('myexamples.csv')
train_examples = example_reader.read()

Or, work with an existing pandas DataFrame:

from skll.data import FeatureSet

# assuming the data labels are in a column called "y"
train_examples = FeatureSet.from_data_frame(my_data_frame,
                                            "A Name for My Data",
                                            labels_column="y")

Train a linear svm (using the already loaded train_examples):

from skll.learner import Learner

learner = Learner('LinearSVC')
learner.train(train_examples)

Evaluate a trained model:

test_examples = Reader.for_path('test.tsv').read()
conf_matrix, accuracy, prf_dict, model_params, obj_score = learner.evaluate(test_examples)

Perform ten-fold cross-validation with a radial SVM:

learner = Learner('SVC')
fold_result_list, grid_search_scores = learner.cross-validate(train_examples)

fold_result_list in this case is a list of the results returned by learner.evaluate for each fold, and grid_search_scores is the highest objective function value achieved when tuning the model.

Generate predictions from a trained model:

predictions = learner.predict(test_examples)