experiments Module

Functions related to running experiments and parsing configuration files.

author:Dan Blanchard (dblanchard@ets.org)
author:Michael Heilman (mheilman@ets.org)
author:Nitin Madnani (nmadnani@ets.org)
author:Chee Wee Leong (cleong@ets.org)
class skll.experiments.NumpyTypeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.encoder.JSONEncoder

This class is used when serializing results, particularly the input label values if the input has int-valued labels. Numpy int64 objects can’t be serialized by the json module, so we must convert them to int objects.

A related issue where this was adapted from: https://stackoverflow.com/questions/11561932/why-does-json-dumpslistnp-arange5-fail-while-json-dumpsnp-arange5-tolis


Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
        iterable = iter(o)
    except TypeError:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
skll.experiments.run_configuration(config_file, local=False, overwrite=True, queue='all.q', hosts=None, write_summary=True, quiet=False, ablation=0, resume=False, log_level=20)[source]

Takes a configuration file and runs the specified jobs on the grid.

  • config_file (str) – Path to the configuration file we would like to use.
  • local (bool, optional) – Should this be run locally instead of on the cluster? Defaults to False.
  • overwrite (bool, optional) – If the model files already exist, should we overwrite them instead of re-using them? Defaults to True.
  • queue (str, optional) – The DRMAA queue to use if we’re running on the cluster. Defaults to 'all.q'.
  • hosts (list of str, optional) – If running on the cluster, these are the machines we should use. Defaults to None.
  • write_summary (bool, optional) – Write a TSV file with a summary of the results. Defaults to True.
  • quiet (bool, optional) – Suppress printing of “Loading…” messages. Defaults to False.
  • ablation (int, optional) – Number of features to remove when doing an ablation experiment. If positive, we will perform repeated ablation runs for all combinations of features removing the specified number at a time. If None, we will use all combinations of all lengths. If 0, the default, no ablation is performed. If negative, a ValueError is raised. Defaults to 0.
  • resume (bool, optional) – If result files already exist for an experiment, do not overwrite them. This is very useful when doing a large ablation experiment and part of it crashes. Defaults to False.
  • log_level (str, optional) – The level for logging messages. Defaults to logging.INFO.

result_json_paths – A list of paths to .json results files for each variation in the experiment.

Return type:

list of str

  • ValueError – If value for "ablation" is not a positive int or None.
  • OSError – If the lenth of the FeatureSet name > 210.