Serving with BentoML#

UnionML integrates with BentoML to make the hand-off between model training to production serving seamless.


Install the bentoml extra:

pip install unionml[bentoml]

Additional Requirements:

Understand the concepts in these UnionML guides:


UnionML ships with a template that helps you get started with a bentoml-enabled unionml project:

unionml init basic_bentoml_app --template basic-bentoml
cd basic_bentoml_app

Creating a BentoMLService#

UnionML provides a BentoMLService class that acts as a converter from the components that you’ve defined in a UnionML app into a bentoml.Service.

As you can see in our project template, we have a file that creates a UnionML app with a BentoMLService:

from typing import List

import pandas as pd
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from unionml import Dataset, Model
from import BentoMLService

dataset = Dataset(name="digits_dataset", test_size=0.2, shuffle=True, targets=["target"])
model = Model(name="digits_classifier", init=LogisticRegression, dataset=dataset)
service = BentoMLService(model, framework="sklearn")

def reader() -> pd.DataFrame:
    return load_digits(as_frame=True).frame

def trainer(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> LogisticRegression:
    return, target.squeeze())

def predictor(estimator: LogisticRegression, features: pd.DataFrame) -> List[float]:
    return [float(x) for x in estimator.predict(features)]

def evaluator(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> float:
    return float(accuracy_score(target.squeeze(), predictor(estimator, features)))

We can then train a model locally and save it to the local BentoML model store:

if __name__ == "__main__":
    model_object, metrics = model.train(hyperparameters={"C": 1.0, "max_iter": 10000})
    predictions = model.predict(features=load_digits(as_frame=True).frame.sample(5, random_state=42))
    print(model_object, metrics, predictions, sep="\n")

    saved_model = service.save_model(model.artifact.model_object)
    print(f"BentoML saved model: {saved_model}")

If we run python, you should see output like this:

{'train': 1.0, 'test': 0.9722222222222222}
[6.0, 9.0, 3.0, 7.0, 2.0]
BentoML saved model: Model(tag="digits_classifier:degqqptj2g6jxlg6")

We’ve successfully saved our unionml-trained model_object to the BentoML model store under the tag digits_classifier:degqqptj2g6jxlg6, where digits_classifier is the model name and degqqptj2g6jxlg6 is the version automatically created for us by BentoML.


You can learn more about BentoML models and the model store here

Defining a Model Service File#

As a framework for creating and deploying ML-powered prediction services, BentoML enforces a clear boundary between model training and serving.

UnionML adheres to this boundary by separating the UnionML app script and a BentoML service definition script. This is so that we can flexibly iterate on model training and tuning, which is separate from serving the best model that we trained.

In a separate file, we define which model we want to serve:

from digits_classifier_app import service

    runnable_method_kwargs={"batchable": False},

Note that you can replace "latest" with an explicit model version, e.g. "degqqptj2g6jxlg6", which may be a desired practice if we want to deploy this service to production.


Under the hood, the configure() method does the following:

Serving Locally#

Start the server locally with:

bentoml serve

The UnionML basic-bentoml project template also comes with a file that lets you test the local endpoint:

import requests
from sklearn.datasets import load_digits

df = load_digits(as_frame=True).frame.drop(["target"], axis="columns")

r =
    headers={"content-type": "application/json"},
    data=df.sample(5, random_state=42).to_json(orient="records"),

Running it should hit the endpoint with a json payload that adheres to the BentoML Service API that we just defined:


Expected output:



You can learn more about the bentoml serve command here

Building a Bento#

A Bento is a standardized file archive containing all the source code, models, data, and additional artifacts that BentoML needs to deploy the model to some target infrastructure. To build a Bento, first we need to define a bentofile.yaml:

service: "service:service.svc"
   owner: bentoml-integration
   stage: dev
- "*.py"  # A pattern for matching which files to include in the bento
  requirements_txt: requirements.txt


The bentofile.yaml file can be configured with additional options, which you can learn more about here.

Then we simply invoke the bentoml build cli command:

bentoml build

Expected Output

Building BentoML service "digits_classifier:tdtkiddj22lszlg6" from build context "...".
Packing model "digits_classifier:degqqptj2g6jxlg6"


Successfully built Bento(tag="digits_classifier:tdtkiddj22lszlg6").

Congratulations! You’ve now built a Bento, which is uniquely identified with the tag digits_classifier:tdtkiddj22lszlg6. You can serve this Bento locally with the bentoml serve tag:

bentoml serve digits_classifier:tdtkiddj22lszlg6

Deploying a Bento#

BentoML offers three ways to deploy a Bento to production:

  • 🐳 Containerize your Bento for custom docker deployment.

  • 🦄 Yatai: A Kubernetes-native model deployment platform.

  • 🚀 bentoctl: a command-line tool for deploying Bentos on any cloud platform.

To learn more about these deployment options, refer to the BentoML deployment guide.

In the next section, we’ll quickly go through an example of deploying the Bento we built earlier to AWS Lambda using bentoctl.

First, install bentoctl:

pip install bentoctl

Then initialize a bentoctl project:

bentoctl init

Expected output:

deployment config generated to: deployment_config.yaml
✨ generated template files.
  - bentoctl.tfvars

This will start an interactive prompt where you fill in some metadata about the project, resulting in a ./deployment_config.yaml file.

Next, we build the deployable artifacts with:

bentoctl build -b digits_classifier:tdtkiddj22lszlg6 -f ./deployment_config.yaml

Where the -b option must be a Bento tag, for example the digits_classifier:tdtkiddj22lszlg6 tag that we say earlier in this guide.

Then, we use the terraform CLI to apply the generated deployment configs to AWS.

terraform init
terraform apply -var-file=bentoctl.tfvars --auto-approve

Expected output:

endpoint = "<ENDPOINT_URL>"
function_name = "<FUNCTION_NAME>"
image_tag = "<IMAGE_TAG>"

The CLI command should output endpoint, function_name, and image_tage metadata.

Test your AWS lambda endpoint with:

URL=$(terraform output -json | jq -r .endpoint.value)predict
curl -i --header "Content-Type: application/json" --request POST --data "$(cat data/sample_features.json)" $URL

This should produce a json-encoded string of our model’s prediction based on the features in data/sample_features.json.

Finally, you can delete all the cloud resources with

bentoctl destroy

Serving a Model Trained on Flyte#

Instead of serving a model trained locally, you can serve a model trained on a Flyte cluster by using the programmatic API. The recommendation here is to separate the UnionML app definition and invocations of the remote_train() to train it on a Flyte cluster.

from unionml.model import ModelArtifact

from digits_classifier_app import model, service

# train the model on a Flyte cluster
model_artifact: ModelArtifact = model.remote_train(
    hyperparameters={"C": 1.0, "max_iter": 5000}

# save the model object to the local bentoml store

Run the script:


Expected output:

BentoML saved model: Model(tag="digits_classifier:xyz")

Finally, update the script with the corresponding model version:



BentoML is a feature-rich model deployment framework, and you can learn more in the official documentation: