Creating a External Plugin¶
In this quick-start tutorial, we will show how to create a new interface
implementation within an external python package and expose it to the SMQTK
plugin framework via entry-points in the package’s setup.py
file.
Lets assume that we are adding an implementation of the
Classifier
interface to some
package we will call MyPackage
, wrapping the use a scikit-learn classifier
in a simple way.
Implementing the interface¶
In MyPackage
, lets imagine we start a new file,
new_classifier.py
such that the module is importable via the module
path MyPackage.plugins.new_classifier
.
In the following code blocks we will incrementally build up a functional
implementation.
To start, we need to import the base interface and create a new class inheriting from this interface:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | from sklearn.linear_model import LogisticRegression
from smqtk.algorithms import Classifier
class SklearnLogisticRegressionClassifier (LogisticRegression, Classifier):
"""
A new, simple implementation of SMQTK's Classifier interface wrapping
Scikit-Learn's LogisticRegression classifier.
"""
@classmethod
def is_usable(cls):
# Required by the ``smqtk.utils.plugin.Pluggable`` parent
return True
def get_config(self):
# Required by the ``smqtk.utils.configuration.Configurable`` parent.
return {
'C': self.C,
'class_weight': self.class_weight,
'dual': self.dual,
'fit_intercept': self.fit_intercept,
'intercept_scaling': self.intercept_scaling,
'max_iter': self.max_iter,
'multi_class': self.multi_class,
'n_jobs': self.n_jobs,
'penalty': self.penalty,
'random_state': self.random_state,
'solver': self.solver,
'tol': self.tol,
'verbose': self.verbose,
'warm_start': self.warm_start,
}
def get_labels(self):
# Required by the ``smqtk.algorithms.Classifier`` parent
try:
return self.classes_.tolist()
except AttributeError:
raise RuntimeError("No model yet fit.")
def _classify_arrays(self, array_iter):
# Required by the ``smqtk.algorithms.Classifier`` parent
x = numpy.asarray(list(array_iter))
proba_arr = self.predict_proba(x)
for proba in proba_arr:
yield dict(zip(self.classes_, proba))
|
Since our source material happens to be a class itself, our implementation can inherit from the Scikit-learn base classifier as well as from the SMQTK interface. In other cases, encapsulation may be a better approach.
The methods defined in our implementation are overrides of abstract methods
declared in our parent, and higher, SMQTK interfaces.
Documentation of abstract methods can usually be found in the interface sources
as doc-strings and often include what is expected to be the input and output
data-types as well as any exception conditions that are expected.
For example, the Classifier
interface
documents get_labels
as raising a RuntimeError
specifically if no model
is loaded to access class labels.
Additionally, Classifier
documents for
the _classify_arrays
method that the input parameter array_iter
should
be an iterable type containing instances of the
DescriptorElement
class and should return
an iterable type (usually a generator) of specifically formatted dictionaries.
This implementation happens to be compliant with the defaults of the
Configurable
interface because all of its
constructor parameters are already JSON compliant (with the occasional
exception of the “random_state” parameter when a RandomState
instance is
used, but we will ignore that here for simplicity).
Thus, get_default_config
will return a JSON-compliant dictionary of the
default parameters as defined in Scikit-learn’s implementation, as well as
from_config
will appropriately return a new instance based on the given
JSON-compliant dictionary.
>>> dflt_config = SklearnLogisticRegressionClassifier.get_default_config()
>>> dflt_config
{'C': 1.0,
'class_weight': None,
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'max_iter': 100,
'multi_class': 'warn',
'n_jobs': None,
'penalty': 'l2',
'random_state': None,
'solver': 'warn',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}
>>> new_dflt_inst = SklearnLogisticRegressionClassifier.from_config(dflt_config)
>>> new_dflt_inst.get_config() == dflt_config
True
Exposing via entry-points¶
In order to allow the SMQTK plugin framework to become aware of our new
implementation we will need to update MyPackage
’s setup.py
file
to add an entry-point.
Since we assumed above that we created our implementation in the module
MyPackage.plugins.new_classifier
, the following should be added:
setup(
...
entry_points={
...
'smqtk_plugins': [
"MyPackage_plugins = MyPackage.plugins.new_classifier",
]
}
)
- Notes on adding entry-points:
- The value to the left of the =’s sign must be unique across installed module providing extensions for the entry-point. A safe method
- Multiple extensions may be specified. This may be useful if your implementations naturally belong in different locations within your package.
- Currently SMQTK only supports providing modules in its extensions. Otherwise a warning will be emitted and that extension will be ignored.
Now, after re-installing MyPackage
, SMQTK’s plugin framework should be
able to discover this new implementation:
>>> from smqtk.algorithms import Classifier
>>> classifier.get_impls()
{..., MyPackage.plugins.new_classifier.SklearnLogisticRegressionClassifier,
...}