Contributing

Thank you for your interest in contributing to SKLL! We welcome any and all contributions.

Guidelines

The SKLL contribution guidelines can be found in our Github repository here. Please try to follow them as much as possible.

SKLL Code Overview

This section will help you get oriented with the SKLL codebase by describing how it is organized, the various SKLL entry points into the code, and what the general code flow looks like for each entry point.

Organization

The main Python code for the SKLL package lives inside the skll sub-directory of the repository. It contains the following files and sub-directories:

Entry Points & Workflow

There are three main entry points into the SKLL codebase:

  1. Experiment configuration files. The primary way to interact with SKLL is by writing configuration files and then passing it to the run_experiment script. When you run the command run_experiment <config_file>, the following happens (at a high level):
    • the configuration file is handed off to the run_configuration() function in experiments.py.
    • a SKLLConfigParser object is instantiated from config.py that parses all of the relevant fields out of the given configuration file.
    • the configuration fields are then passed to the _classify_featureset() function in experiments.py which instantiates the learners (using code from learner.py), the featuresets (using code from reader.py & featureset.py), and runs the experiments, collects the results, and writes them out to disk.
  2. SKLL API. Another way to interact with SKLL is via the SKLL API directly in your Python code rather than using configuration files. For example, you could use the Learner.from_file() method to load a saved model from disk and make predictions on new data. The documentation for the SKLL API can be found here.
  3. Utility scripts. The scripts listed in the section above under utilities are also entry points into the SKLL code. These scripts are convenient wrappers that use the SKLL API for commonly used tasks, e.g., generating predictions on new data from an already trained model.