Deploying to Flyte#
UnionML integrates tightly with Flyte, which is a data- and machine-learning-aware orchestration platform that leverages cloud services like AWS and GCP to easily scale and maintain data processing machine learning workloads.
In this guide, we’ll:
Spin up a demo Flyte cluster, which is a standalone, minimal Flyte cluster that you can create on your individual laptop or workstation.
Configure digit classification UnionML app to use the Flyte sandbox as the compute backend for our training and prediction workload.
Prerequisites
Deploy App Workflows#
A UnionML app is composed of a Dataset
, Model
, and
serving app component (e.g. fastapi.FastAPI
). Under the hood, a Model
object exposes
*_workflow
methods that return flytekit.workflow
objects, which are essentially
execution graphs that perform multiple steps of computation.
To make these computations scalable, reproducible, and auditable, we can serialize our workflows and register them to a Flyte cluster, in this case a local Flyte demo cluster.
Initializing a Digits Classification App#
Going back to our digit classification app, let’s assume that we’ve
initialized our app using the unionml init my_app
command
and have an app.py
script with our digits classification model.
See app.py
from pathlib import Path
from typing import List
import pandas as pd
from fastapi import FastAPI
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from unionml import Dataset, Model
dataset = Dataset(name="digits_dataset", test_size=0.2, shuffle=True, targets=["target"])
model = Model(name="digits_classifier", init=LogisticRegression, dataset=dataset)
@dataset.reader
def reader() -> pd.DataFrame:
return load_digits(as_frame=True).frame
@model.trainer
def trainer(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> LogisticRegression:
return estimator.fit(features, target.squeeze())
@model.predictor
def predictor(estimator: LogisticRegression, features: pd.DataFrame) -> List[float]:
return [float(x) for x in estimator.predict(features)]
@model.evaluator
def evaluator(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> float:
return float(accuracy_score(target.squeeze(), predictor(estimator, features)))
# attach Flyte demo cluster metadata
model.remote(
dockerfile="Dockerfile",
config_file=str(Path.home() / ".flyte" / "config.yaml"),
project="{{ cookiecutter.project_name }}",
domain="development",
)
# serve with FastAPI
app = FastAPI()
model.serve(app)
if __name__ == "__main__":
model_object, metrics = model.train(hyperparameters={"C": 1.0, "max_iter": 10000})
predictions = model.predict(features=load_digits(as_frame=True).frame.sample(5, random_state=42))
print(model_object, metrics, predictions, sep="\n")
# save model to a file, using joblib as the default serialization format
model.save("/tmp/model_object.joblib")
Start a Local Flyte Demo Cluster#
To start a Flyte demo cluster, run the following in your app directory:
flytectl demo start --source .
Note
The --source .
flag will initialize the Flyte demo cluster in a docker container with your app files
mounted inside. This is so that your app’s workflows can be serialized and registered directly in the
Flyte sandbox.
We should now be able to go to http://localhost:30080/console
on our browser to see the Flyte UI.
The App Dockerfile#
UnionML relies on Docker to package up all of your app’s
source code and dependencies. The basic app template comes with a Dockerfile
, which
we can use to do this:
See Dockerfile
FROM python:3.8-slim-buster
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
RUN apt-get update && apt-get install -y build-essential git-all
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
# Install Python dependencies
COPY ./requirements.txt /root
RUN pip install -r /root/requirements.txt
# Copy the actual code
COPY . /root
Configuring the Remote Backend#
All you need to do to get your UnionML app ready for deployment is to configure it with:
The Docker registry and image name that you want to use to package your app
The Flyte project and domain you want to use hosting your app’s microservices.
In the app.py
script, you can see the following code that does just this:
model.remote(
dockerfile="Dockerfile",
config_file=str(Path.home() / ".flyte" / "config.yaml"),
project="digits-classifier",
domain="development",
)
Important
We’ve set the config_file
argument to
Path.home() / ".flyte" / "config.yaml"
, which was created automatically when
we invoked flytectl demo start
.
Under the hood, UnionML will handle the Docker build process locally, bypassing the need to push your app image to a remote registry.
Managing your Own Flyte Cluster#
In this guide we’re using the Flyte demo cluster that you can spin up on your local machine. However, if you want to access to the full power of Flyte, e.g. scaling to larger compute clusters and gpu accelerators for more data- and compute-heavy models, you can follow the Flyte Deployment Guides.
Important
To point your UnionML app to your own Flyte cluster, specify a config.yaml
file in the config_file
argument that is properly configured to access that Flyte cluster. In this case, you’ll also need to
specify a Docker registry that you have push access to via the model.remote(registry="...")
keyword
argument.
To learn more about cluster configuration, see here.
UnionML CLI#
The UnionML python package ships with the UnionML cli, which we use to deploy the model and
invoke the training/prediction microservices that are automatically compiled by the Dataset
and
Model
objects.
unionml deploy
#
To deploy, run:
unionml deploy app:model
Note
The first argument of unionml deploy
should be a :
-separated string whose first section
is the module name containing the UnionML app and second section is the variable
name pointing to the unionml.Model
object.
Warning
The Flyte demo cluster may take a few seconds to create the resources required to run your
workflows after running the unionml deploy
command. If the commands below fail, retry
them after a few seconds.
Now that your app workflows are deployed, you can run training and prediction jobs using the Flyte sandbox cluster:
unionml train
#
Train a model given some hyperparameters:
unionml train app:model -i '{"hyperparameters": {"C": 1.0, "max_iter": 1000}}'
unionml predict
#
Generate predictions with json data:
unionml predict app:model -f data/sample_features.json
Where data/sample_features.json
is a json file containing feature data that’s compatible with the model.
Note
Currently, only json files that can be converted to a pandas DataFrame is supported.
Important
You can also generate predictions by fetching data from the dataset.reader
function. However,
this assumes that you’ve correctly factored your reader to get data from some arbitrary source.
For example, you can implement your reader function to get data from an s3_path:
@dataset.reader
def reader(s3_path: str) -> pd.DataFrame:
return pandas.read_csv(s3_path)
Then, you can generate predictions with the --inputs
option:
unionml predict app:model --inputs '{"s3_path": "s3://my-bucket/path/to/data.csv"}'
Programmatic API#
UnionML also provides a programmatic API to deploy your app and kick off training and prediction jobs
that run on the Flyte cluster. We simply import the UnionML Model
object into another python module:
remote_deploy()
#
from app import model
model.remote_deploy()
remote_train()
#
from unionml.model import ModelArtifact
from app import model
model_artifact: ModelArtifact = model.remote_train(
hyperparameters={"C": 1.0, "max_iter": 1000},
)
The model_artifact
output NamedTuple
contains three attributes:
model_object
: The actual trained model object, in this case an sklearnBaseEstimator
.hyperparameters
: The hyperparameters used to train the model.metrics
: The metrics associated with the training and test set of the dataset used during training.
Note
By default, invoking remote_train
is a blocking operation, i.e. the python process will wait until
the Flyte backend completes training.
remote_predict()
#
from sklearn.datasets import load_digits
from app import model
features = load_digits(as_frame=True).frame.sample(5, random_state=42)
predictions = model.remote_predict(features=features)
Note
The features
kwarg should be the same type as the output type of the dataset.reader
function. In this case, that would be a pandas.DataFrame
.
Important
Similar to the point about about generating predictions using the @dataset.reader
function,
you can pass in the reader keyword arguments to the model.remote_predict
method, assuming
your reader function looks like:
@dataset.reader
def reader(s3_path: str) -> pd.DataFrame:
return pandas.read_csv(s3_path)
You can generate predictions like so:
predictions = model.remote_predict(s3_path="s3://my-bucket/path/to/data.csv")
Next#
Now that you’ve deployed your UnionML app to a Flyte cluster to scale your training jobs and do batch prediction, you have a few options for serving predictions: