Serving with BentoML#

UnionML integrates with BentoML to make the hand-off between model training to production serving seamless.

Prerequisites

Install the bentoml extra:

pip install unionml[bentoml]

Additional Requirements:

Install bentoctl
Install terraform

Understand the concepts in these UnionML guides:

Local Training and Prediction guide for local model training.
Deploying to Flyte guide for model training at scale with Flyte.

Setup#

UnionML ships with a template that helps you get started with a bentoml-enabled unionml project:

unionml init basic_bentoml_app --template basic-bentoml
cd basic_bentoml_app

Creating a BentoMLService#

UnionML provides a BentoMLService class that acts as a converter from the components that you’ve defined in a UnionML app into a bentoml.Service.

As you can see in our project template, we have a digits_classifier_app.py file that creates a UnionML app with a BentoMLService:

digits_classifier_app.py

from typing import List

import pandas as pd
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from unionml import Dataset, Model
from unionml.services.bentoml import BentoMLService

dataset = Dataset(name="digits_dataset", test_size=0.2, shuffle=True, targets=["target"])
model = Model(name="digits_classifier", init=LogisticRegression, dataset=dataset)
service = BentoMLService(model, framework="sklearn")


@dataset.reader
def reader() -> pd.DataFrame:
    return load_digits(as_frame=True).frame


@model.trainer
def trainer(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> LogisticRegression:
    return estimator.fit(features, target.squeeze())


@model.predictor
def predictor(estimator: LogisticRegression, features: pd.DataFrame) -> List[float]:
    return [float(x) for x in estimator.predict(features)]


@model.evaluator
def evaluator(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> float:
    return float(accuracy_score(target.squeeze(), predictor(estimator, features)))

We can then train a model locally and save it to the local BentoML model store:

if __name__ == "__main__":
    model_object, metrics = model.train(hyperparameters={"C": 1.0, "max_iter": 10000})
    predictions = model.predict(features=load_digits(as_frame=True).frame.sample(5, random_state=42))
    print(model_object, metrics, predictions, sep="\n")

    saved_model = service.save_model(model.artifact.model_object)
    print(f"BentoML saved model: {saved_model}")

If we run python digits_classifier_app.py, you should see output like this:

LogisticRegression(max_iter=10000.0)
{'train': 1.0, 'test': 0.9722222222222222}
[6.0, 9.0, 3.0, 7.0, 2.0]
BentoML saved model: Model(tag="digits_classifier:degqqptj2g6jxlg6")

We’ve successfully saved our unionml-trained model_object to the BentoML model store under the tag digits_classifier:degqqptj2g6jxlg6, where digits_classifier is the model name and degqqptj2g6jxlg6 is the version automatically created for us by BentoML.

Note

You can learn more about BentoML models and the model store here

Defining a Model Service File#

As a framework for creating and deploying ML-powered prediction services, BentoML enforces a clear boundary between model training and serving.

UnionML adheres to this boundary by separating the UnionML app script and a BentoML service definition script. This is so that we can flexibly iterate on model training and tuning, which is separate from serving the best model that we trained.

In a separate file, we define which model we want to serve:

service.py

from digits_classifier_app import service

service.load_model("latest")
service.configure(
    enable_async=False,
    supported_resources=("cpu",),
    supports_cpu_multi_threading=False,
    runnable_method_kwargs={"batchable": False},
)

Note that you can replace "latest" with an explicit model version, e.g. "degqqptj2g6jxlg6", which may be a desired practice if we want to deploy this service to production.

Note

Under the hood, the configure() method does the following:

Creates a bentoml.Service with a custom bentoml.Runnable class that re-uses UnionML-defined components so that you can seamlessly create an API based on the unionml.dataset.Dataset.feature_loader, unionml.dataset.Dataset.feature_transformer, and unionml.model.Model.predictor implementations.
Infers the feature and output API IO Descriptors, based on the above UnionML-defined components. These can be explicitly provided as a keyword-argument to the configure method in case the feature and prediction output types are not recognized in the IO_DESCRIPTOR_MAPPING.
Defines a svc property that can be used to access the underlying bentoml.Service.

Serving Locally#

Start the server locally with:

bentoml serve service.py:service.svc

The UnionML basic-bentoml project template also comes with a request.py file that lets you test the local endpoint:

import requests
from sklearn.datasets import load_digits

df = load_digits(as_frame=True).frame.drop(["target"], axis="columns")

r = requests.post(
    "http://0.0.0.0:3000/predict",
    headers={"content-type": "application/json"},
    data=df.sample(5, random_state=42).to_json(orient="records"),
)
print(r.text)

Running it should hit the endpoint with a json payload that adheres to the BentoML Service API that we just defined:

python request.py

Expected output:

[6.0,9.0,3.0,7.0,2.0]

Note

You can learn more about the bentoml serve command here

Building a Bento#

A Bento is a standardized file archive containing all the source code, models, data, and additional artifacts that BentoML needs to deploy the model to some target infrastructure. To build a Bento, first we need to define a bentofile.yaml:

service: "service:service.svc"
labels:
   owner: bentoml-integration
   stage: dev
include:
- "*.py"  # A pattern for matching which files to include in the bento
python:
  requirements_txt: requirements.txt

Note

The bentofile.yaml file can be configured with additional options, which you can learn more about here.

Then we simply invoke the bentoml build cli command:

bentoml build

Expected Output

Building BentoML service "digits_classifier:tdtkiddj22lszlg6" from build context "...".
Packing model "digits_classifier:degqqptj2g6jxlg6"

██████╗░███████╗███╗░░██╗████████╗░█████╗░███╗░░░███╗██╗░░░░░
██╔══██╗██╔════╝████╗░██║╚══██╔══╝██╔══██╗████╗░████║██║░░░░░
██████╦╝█████╗░░██╔██╗██║░░░██║░░░██║░░██║██╔████╔██║██║░░░░░
██╔══██╗██╔══╝░░██║╚████║░░░██║░░░██║░░██║██║╚██╔╝██║██║░░░░░
██████╦╝███████╗██║░╚███║░░░██║░░░╚█████╔╝██║░╚═╝░██║███████╗
╚═════╝░╚══════╝╚═╝░░╚══╝░░░╚═╝░░░░╚════╝░╚═╝░░░░░╚═╝╚══════╝

Successfully built Bento(tag="digits_classifier:tdtkiddj22lszlg6").

Congratulations! You’ve now built a Bento, which is uniquely identified with the tag digits_classifier:tdtkiddj22lszlg6. You can serve this Bento locally with the bentoml serve tag:

bentoml serve digits_classifier:tdtkiddj22lszlg6

Deploying a Bento#

BentoML offers three ways to deploy a Bento to production:

🐳 Containerize your Bento for custom docker deployment.
🦄 Yatai: A Kubernetes-native model deployment platform.
🚀 bentoctl: a command-line tool for deploying Bentos on any cloud platform.

To learn more about these deployment options, refer to the BentoML deployment guide.

In the next section, we’ll quickly go through an example of deploying the Bento we built earlier to AWS Lambda using bentoctl.

First, install bentoctl:

pip install bentoctl

Then initialize a bentoctl project:

bentoctl init

Expected output:

...
deployment config generated to: deployment_config.yaml
✨ generated template files.
  - bentoctl.tfvars
  - main.tf

This will start an interactive prompt where you fill in some metadata about the project, resulting in a ./deployment_config.yaml file.

Next, we build the deployable artifacts with:

bentoctl build -b digits_classifier:tdtkiddj22lszlg6 -f ./deployment_config.yaml

Where the -b option must be a Bento tag, for example the digits_classifier:tdtkiddj22lszlg6 tag that we say earlier in this guide.

Then, we use the terraform CLI to apply the generated deployment configs to AWS.

terraform init
terraform apply -var-file=bentoctl.tfvars --auto-approve

bentoctl apply

Expected output:

...
endpoint = "<ENDPOINT_URL>"
function_name = "<FUNCTION_NAME>"
image_tag = "<IMAGE_TAG>"

The CLI command should output endpoint, function_name, and image_tage metadata.

Test your AWS lambda endpoint with:

URL=$(terraform output -json | jq -r .endpoint.value)predict
curl -i --header "Content-Type: application/json" --request POST --data "$(cat data/sample_features.json)" $URL

This should produce a json-encoded string of our model’s prediction based on the features in data/sample_features.json.

Finally, you can delete all the cloud resources with

bentoctl destroy

Serving a Model Trained on Flyte#

Instead of serving a model trained locally, you can serve a model trained on a Flyte cluster by using the programmatic API. The recommendation here is to separate the UnionML app definition and invocations of the remote_train() to train it on a Flyte cluster.

remote_training.py

from unionml.model import ModelArtifact

from digits_classifier_app import model, service


# train the model on a Flyte cluster
model_artifact: ModelArtifact = model.remote_train(
    hyperparameters={"C": 1.0, "max_iter": 5000}
)

# save the model object to the local bentoml store
service.save_model(model_artifact.model_object)

Run the script:

python remote_training.py

Expected output:

...
BentoML saved model: Model(tag="digits_classifier:xyz")

Finally, update the service.py script with the corresponding model version:

# service.py
...
service.load_model("xyz")
...

Next#

BentoML is a feature-rich model deployment framework, and you can learn more in the official documentation: