Deploying to Flyte#

UnionML integrates tightly with Flyte, which is a data- and machine-learning-aware orchestration platform that leverages cloud services like AWS and GCP to easily scale and maintain data processing machine learning workloads.

In this guide, we’ll:

  1. Spin up a demo Flyte cluster, which is a standalone, minimal Flyte cluster that you can create on your individual laptop or workstation.

  2. Configure digit classification UnionML app to use the Flyte sandbox as the compute backend for our training and prediction workload.

Prerequisites#

  • Install flytectl, the command-line interface for Flyte.

  • Install Docker and make sure you have the Docker daemon running.

Deploy App Workflows#

A UnionML app is composed of a Dataset, Model, and serving app component (e.g. fastapi.FastAPI). Under the hood, a Model object exposes *_workflow methods that return flytekit.workflow objects, which are essentially execution graphs that perform multiple steps of computation.

To make these computations scalable, reproducible, and auditable, we can serialize our workflows and register them to a Flyte cluster, in this case a local Flyte demo cluster.

Initializing a Digits Classification App#

Going back to our digit classification app, let’s assume that we’ve initialized our app using the unionml init my_app command and have an app.py script with our digits classification model.

Start a Local Flyte Demo Cluster#

To start a Flyte demo cluster, run the following in your app directory:

flytectl demo start --source .

Note

The --source . flag will initialize the Flyte demo cluster in a docker container with your app files mounted inside. This is so that your app’s workflows can be serialized and registered directly in the Flyte sandbox.

We should now be able to go to http://localhost:30080/console on our browser to see the Flyte UI.

The App Dockerfile#

UnionML relies on Docker to package up all of your app’s source code and dependencies. The basic app template comes with a Dockerfile, which we can use to do this:

Configuring the Remote Backend#

All you need to do to get your UnionML app ready for deployment is to configure it with:

  1. The Docker registry and image name that you want to use to package your app

  2. The Flyte project and domain you want to use hosting your app’s microservices.

In the app.py script, you can see the following code that does just this:

model.remote(
    dockerfile="Dockerfile",
    config_file=str(Path.home() / ".flyte" / "config.yaml"),
    project="digits-classifier",
    domain="development",
)

Important

We’ve set the config_file argument to Path.home() / ".flyte" / "config.yaml", which was created automatically when we invoked flytectl demo start.

Under the hood, UnionML will handle the Docker build process locally, bypassing the need to push your app image to a remote registry.

Managing your Own Flyte Cluster#

In this guide we’re using the Flyte demo cluster that you can spin up on your local machine. However, if you want to access to the full power of Flyte, e.g. scaling to larger compute clusters and gpu accelerators for more data- and compute-heavy models, you can follow the Flyte Deployment Guides.

Important

To point your UnionML app to your own Flyte cluster, specify a config.yaml file in the config_file argument that is properly configured to access that Flyte cluster. In this case, you’ll also need to specify a Docker registry that you have push access to via the model.remote(registry="...") keyword argument.

To learn more about cluster configuration, see here.

UnionML CLI#

The UnionML python package ships with the UnionML cli, which we use to deploy the model and invoke the training/prediction microservices that are automatically compiled by the Dataset and Model objects.

unionml deploy#

To deploy, run:

unionml deploy app:model

Note

The first argument of unionml deploy should be a :-separated string whose first section is the module name containing the UnionML app and second section is the variable name pointing to the unionml.Model object.

Warning

The Flyte demo cluster may take a few seconds to create the resources required to run your workflows after running the unionml deploy command. If the commands below fail, retry them after a few seconds.

Now that your app workflows are deployed, you can run training and prediction jobs using the Flyte sandbox cluster:

unionml train#

CLI reference

Train a model given some hyperparameters:

unionml train app:model -i '{"hyperparameters": {"C": 1.0, "max_iter": 1000}}'

unionml predict#

CLI reference

Generate predictions with json data:

unionml predict app:model -f data/sample_features.json

Where data/sample_features.json is a json file containing feature data that’s compatible with the model.

Note

Currently, only json files that can be converted to a pandas DataFrame is supported.

Important

You can also generate predictions by fetching data from the dataset.reader function. However, this assumes that you’ve correctly factored your reader to get data from some arbitrary source. For example, you can implement your reader function to get data from an s3_path:

@dataset.reader
def reader(s3_path: str) -> pd.DataFrame:
    return pandas.read_csv(s3_path)

Then, you can generate predictions with the --inputs option:

unionml predict app:model --inputs '{"s3_path": "s3://my-bucket/path/to/data.csv"}'

Programmatic API#

UnionML also provides a programmatic API to deploy your app and kick off training and prediction jobs that run on the Flyte cluster. We simply import the UnionML Model object into another python module:

remote_deploy()#

from app import model

model.remote_deploy()

remote_train()#

from unionml.model import ModelArtifact

from app import model

model_artifact: ModelArtifact = model.remote_train(
    hyperparameters={"C": 1.0, "max_iter": 1000},
)

The model_artifact output NamedTuple contains three attributes:

  • model_object: The actual trained model object, in this case an sklearn BaseEstimator.

  • hyperparameters: The hyperparameters used to train the model.

  • metrics: The metrics associated with the training and test set of the dataset used during training.

Note

By default, invoking remote_train is a blocking operation, i.e. the python process will wait until the Flyte backend completes training.

remote_predict()#

from sklearn.datasets import load_digits

from app import model

features = load_digits(as_frame=True).frame.sample(5, random_state=42)
predictions = model.remote_predict(features=features)

Note

The features kwarg should be the same type as the output type of the dataset.reader function. In this case, that would be a pandas.DataFrame.

Important

Similar to the point about about generating predictions using the @dataset.reader function, you can pass in the reader keyword arguments to the model.remote_predict method, assuming your reader function looks like:

@dataset.reader
def reader(s3_path: str) -> pd.DataFrame:
    return pandas.read_csv(s3_path)

You can generate predictions like so:

predictions = model.remote_predict(s3_path="s3://my-bucket/path/to/data.csv")

Next#

Now that you’ve deployed your UnionML app to a Flyte cluster to scale your training jobs and do batch prediction, let’s see how we can serve these predictions in production in an online setting with FastAPI.