Binding a Model and Dataset#
In Defining a Dataset we saw how to create a minimal Dataset
specification, which uses the sklearn digits dataset and pandas.DataFrame
as the underlying data container.
Now let’s define a Model and bind it with the Dataset.
from unionml import DatasetModel
from sklearn.linear_model import LogisticRegression
dataset = Dataset(name="digits_dataset", test_size=0.2, shuffle=True, targets=["target"])
model = Model(name="digits_classifier", init=LogisticRegression, dataset=dataset)
Note
In the above code snippet you might notice a few things:
We’re defining a
Modelnamed"digits_classifier".The
initargument is a class, function, or callable object that returns your model of interest when called. In this case, we’re using thesklearn.linear_model.LogisticRegressionclass to train our digits classifier.The
datasetargument takes aDatasetobject, effectively constraining the model’s form as a function of theDatasetspecification.
Model Functions#
Like the dataset object, The model object we defined above exposes three
core functions required for model training, prediction, and evaluation.
init()#
In the Model constructor you can note that the init argument
takes either a class that is initialized to produce a model object, which will then be passed
down to the trainer function as the first positional argument.
In most cases this will suffice, but you can define a decorated init
function that achieves the same thing, i.e. the difference is purely syntactic.
The equivalent init function would be:
@model.init
def init(hyperparameters: dict) -> LogisticRegression:
return LogisticRegression(**hyperparameters)
trainer()#
The trainer function should contain all the logic for training a model from
scratch or a previously saved model checkpoint.
@model.trainer
def trainer(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> LogisticRegression:
return estimator.fit(features, target.squeeze())
Note
The first argument to trainer should be the sklearn estimator object that needs to
be updated as a function of the features and target dataframes.
In this example, the function body simply invokes the sklearn API standard .fit method for
training, however you can implement any arbitrary training logic in the trainer function.
predictor()#
The predictor function takes an estimator object and features dataframe as inputs
and generates a List of floats representing the predicted digit that the features
represent.
from typing import List
@model.predictor
def predictor(estimator: LogisticRegression, features: pd.DataFrame) -> List[float]:
return [float(x) for x in estimator.predict(features)]
evaluator()#
Finally, we need to specify how to evaluate an estimator given features and a target.
In this case, we’ll just use the sklearn accuracy_score function.
from sklearn.metrics import accuracy_score
@model.evaluator
def evaluator(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> float:
return accuracy_score(target.squeeze(), predictor(estimator, features))
Note
Since predictor is just a python function, we can use it inside the evaluator function body.
Next#
Now that we’ve defined a Dataset and Model and bound them
together, let’s see how we can perform Local Training and Prediction.