Binding a Model and Dataset#
In Defining a Dataset we saw how to create a minimal Dataset
specification, which uses the sklearn digits dataset and pandas.DataFrame
as the underlying data container.
Now let’s define a Model
and bind it with the Dataset
.
from unionml import DatasetModel
from sklearn.linear_model import LogisticRegression
dataset = Dataset(name="digits_dataset", test_size=0.2, shuffle=True, targets=["target"])
model = Model(name="digits_classifier", init=LogisticRegression, dataset=dataset)
Note
In the above code snippet you might notice a few things:
We’re defining a
Model
named"digits_classifier"
.The
init
argument is a class, function, or callable object that returns your model of interest when called. In this case, we’re using thesklearn.linear_model.LogisticRegression
class to train our digits classifier.The
dataset
argument takes aDataset
object, effectively constraining the model’s form as a function of theDataset
specification.
Model
Functions#
Like the dataset
object, The model
object we defined above exposes three
core functions required for model training, prediction, and evaluation.
init()
#
In the Model
constructor you can note that the init
argument
takes either a class that is initialized to produce a model object, which will then be passed
down to the trainer
function as the first positional argument.
In most cases this will suffice, but you can define a decorated init
function that achieves the same thing, i.e. the difference is purely syntactic.
The equivalent init
function would be:
@model.init
def init(hyperparameters: dict) -> LogisticRegression:
return LogisticRegression(**hyperparameters)
trainer()
#
The trainer
function should contain all the logic for training a model from
scratch or a previously saved model checkpoint.
@model.trainer
def trainer(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> LogisticRegression:
return estimator.fit(features, target.squeeze())
Note
The first argument to trainer
should be the sklearn estimator object that needs to
be updated as a function of the features
and target
dataframes.
In this example, the function body simply invokes the sklearn API standard .fit
method for
training, however you can implement any arbitrary training logic in the trainer
function.
predictor()
#
The predictor
function takes an estimator
object and features
dataframe as inputs
and generates a List
of float
s representing the predicted digit that the features
represent.
from typing import List
@model.predictor
def predictor(estimator: LogisticRegression, features: pd.DataFrame) -> List[float]:
return [float(x) for x in estimator.predict(features)]
evaluator()
#
Finally, we need to specify how to evaluate an estimator
given features
and a target
.
In this case, we’ll just use the sklearn accuracy_score
function.
from sklearn.metrics import accuracy_score
@model.evaluator
def evaluator(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> float:
return accuracy_score(target.squeeze(), predictor(estimator, features))
Note
Since predictor
is just a python function, we can use it inside the evaluator
function body.
Next#
Now that we’ve defined a Dataset
and Model
and bound them
together, let’s see how we can perform Local Training and Prediction.