experiments Module

Functions related to running experiments and parsing configuration files.

author:Dan Blanchard (dblanchard@ets.org)
author:Michael Heilman (mheilman@ets.org)
author:Nitin Madnani (nmadnani@ets.org)
author:Chee Wee Leong (cleong@ets.org)
class skll.experiments.NumpyTypeEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]

Bases: json.encoder.JSONEncoder

This class is used when serializing results, particularly the input label values if the input has int-valued labels. Numpy int64 objects can’t be serialized by the json module, so we must convert them to int objects.

A related issue where this was adapted from: http://stackoverflow.com/questions/11561932/why-does-json-dumpslistnp-arange5-fail-while-json-dumpsnp-arange5-tolis

skll.experiments.run_configuration(config_file, local=False, overwrite=True, queue=u'all.q', hosts=None, write_summary=True, quiet=False, ablation=0, resume=False)[source]

Takes a configuration file and runs the specified jobs on the grid.

  • config_path (str) – Path to the configuration file we would like to use.
  • local (bool) – Should this be run locally instead of on the cluster?
  • overwrite (bool) – If the model files already exist, should we overwrite them instead of re-using them?
  • queue (str) – The DRMAA queue to use if we’re running on the cluster.
  • hosts (list of str) – If running on the cluster, these are the machines we should use.
  • write_summary (bool) – Write a tsv file with a summary of the results.
  • quiet (bool) – Suppress printing of “Loading...” messages.
  • ablation (int or None) – Number of features to remove when doing an ablation experiment. If positive, we will perform repeated ablation runs for all combinations of features removing the specified number at a time. If None, we will use all combinations of all lengths. If 0, the default, no ablation is performed. If negative, a ValueError is raised.
  • resume (bool) – If result files already exist for an experiment, do not overwrite them. This is very useful when doing a large ablation experiment and part of it crashes.

A list of paths to .json results files for each variation in the experiment.

Return type:

list of str