Thank you for your interest in contributing to SKLL! We welcome any and all contributions.
The SKLL contribution guidelines can be found in our Github repository here. We strongly encourage all SKLL contributions to follow these guidelines.
SKLL Code Overview
This section will help you get oriented with the SKLL codebase by describing how it is organized, the various SKLL entry points into the code, and what the general code flow looks like for each entry point.
The main Python code for the SKLL package lives inside the
skll sub-directory of the repository. It contains the following files and sub-directories:
config/ : Code to parse SKLL experiment configuration files.
experiments/ : Code that is related to creating and running SKLL experiments. It also contains code that collects the various evaluation metrics and predictions for each SKLL experiment and writes them out to disk.
learner/ : Code for the Learner and VotingLearner classes. The former is instantiated for all learner names specified in the experiment configuration file except
VotingRegressorfor which the latter is instantiated instead.
__init__.py : Code used to initialize the
featureset.py : Code for the
FeatureSetclass metadata for a given set of instances.
readers.py : Code for classes that can read various file formats and create
FeatureSetobjects from them.
writers.py : Code for classes that can write
FeatureSetobjects to files on disk in various formats.
dict_vectorizer.py : Code for a
DictVectorizerclass that subclasses
sklearn.feature_extraction.DictVectorizerto add an
__eq__()method that we need for vectorizer equality.
utils/ : Code for different utility scripts, functions, and classes used throughout SKLL. The most important ones are the command line scripts in the
version.py : Code to define the SKLL version. Only changed for new releases.
test_*.py: These files contain the code for the unit tests and regression tests.
Entry Points & Workflow
There are three main entry points into the SKLL codebase:
Experiment configuration files. The primary way to interact with SKLL is by writing configuration files and then passing it to the run_experiment script. When you run the command
run_experiment <config_file>, the following happens (at a high level):
the configuration file is handed off to the run_configuration() function in
a SKLLConfigParser object is instantiated from
config.pythat parses all of the relevant fields out of the given configuration file.
the configuration fields are then passed to the _classify_featureset() function in
experiments.pywhich instantiates the learners (using code from
learner.py), the featuresets (using code from
featureset.py), and runs the experiments, collects the results, and writes them out to disk.
SKLL API. Another way to interact with SKLL is via the SKLL API directly in your Python code rather than using configuration files. For example, you could use the Learner.from_file() or VotingLearner.from_file() methods to load saved models of those types from disk and make predictions on new data. The documentation for the SKLL API can be found here.
Utility scripts. The scripts listed in the section above under
utilsare also entry points into the SKLL code. These scripts are convenient wrappers that use the SKLL API for commonly used tasks, e.g., generating predictions on new data from an already trained model.