Functions for running and interacting with SKLL experiments.
Nitin Madnani (email@example.com)
Dan Blanchard (firstname.lastname@example.org)
Michael Heilman (email@example.com)
Chee Wee Leong (firstname.lastname@example.org)
- skll.experiments.generate_learning_curve_plots(experiment_name, output_dir, learning_curve_tsv_file)
Generate learning curves using the TSV output file from a learning curve experiment.
This function generates both the score plots as well as the fit time plots.
- skll.experiments.load_featureset(dir_path, feat_files, suffix, id_col='id', label_col='y', ids_to_floats=False, quiet=False, class_map=None, feature_hasher=False, num_features=None, logger=None)
Load a list of feature files and merge them.
skll.types.PathOrStr) – Path to the directory that contains the feature files.
feat_files (List[str]) – A list of feature file prefixes.
suffix (str) – The suffix to add to feature file prefixes to get the full filenames.
id_col (str, default="id") – Name of the column which contains the instance IDs. If no column with that name exists, or None is specified, example IDs will be automatically generated.
label_col (str, default="y") – Name of the column which contains the class labels. If no column with that name exists, or None is specified, the data is considered to be unlabeled.
ids_to_floats (bool, default=False) – Whether to convert the IDs to floats to save memory. Will raise error if we encounter non-numeric IDs.
quiet (bool, default=False) – Do not print “Loading…” status message to stderr.
skll.types.ClassMap], default=None) – Mapping from original class labels to new ones. This is mainly used for collapsing multiple labels into a single class. Anything not in the mapping will be kept the same.
feature_hasher (bool, default=False) – Should we use a FeatureHasher when vectorizing features?
num_features (Optional[int], default=None) – The number of features to use with the
FeatureHasher. This should always be set to the power of 2 greater than the actual number of features you’re using.
logger (Optional[logging.Logger], default=None) – A logger instance to use to log messages instead of creating a new one by default.
merged_set – A
FeatureSetinstance containing the specified labels, IDs, features, and feature vectorizer.
- Return type:
- skll.experiments.run_configuration(config_file, local=False, overwrite=True, queue='all.q', hosts=None, write_summary=True, quiet=False, ablation=0, resume=False, log_level=20)
Run jobs specified in the configuration file locally or on the grid.
skll.types.PathOrStr) – Path to the configuration file we would like to use.
local (bool, default=False) – Should this be run locally instead of on the cluster?
overwrite (bool, default=True) – If the model files already exist, should we overwrite them instead of re-using them?
queue (str, default="all.q") – The DRMAA queue to use if we’re running on the cluster.
hosts (Optional[List[str]], default=None) – If running on the cluster, these are the machines we should use.
write_summary (bool, default=True) – Write a TSV file with a summary of the results.
quiet (bool, default=False) – Suppress printing of “Loading…” messages.
ablation (int, default=0) – Number of features to remove when doing an ablation experiment. If positive, we will perform repeated ablation runs for all combinations of features removing the specified number at a time. If
None, we will use all combinations of all lengths. If 0, the default, no ablation is performed. If negative, a
resume (bool, default=False) – If result files already exist for an experiment, do not overwrite them. This is very useful when doing a large ablation experiment and part of it crashes.
log_level (int, default=logging.INFO) – The level for logging messages.
result_json_paths – A list of paths to .json results files for each variation in the experiment.
- Return type: