Serving with BentoML#
UnionML integrates with BentoML to make the hand-off between model training to production serving seamless.
Prerequisites
Install the bentoml extra:
pip install unionml[bentoml]
Additional Requirements:
Understand the concepts in these UnionML guides:
Local Training and Prediction guide for local model training.
Deploying to Flyte guide for model training at scale with Flyte.
Setup#
UnionML ships with a template that helps you get started with a bentoml-enabled unionml project:
unionml init basic_bentoml_app --template basic-bentoml
cd basic_bentoml_app
Creating a BentoMLService#
UnionML provides a BentoMLService
class that acts as a converter from
the components that you’ve defined in a UnionML app into a bentoml.Service
.
As you can see in our project template, we have a digits_classifier_app.py
file that creates a UnionML app with a
BentoMLService
:
digits_classifier_app.py
from typing import List
import pandas as pd
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from unionml import Dataset, Model
from unionml.services.bentoml import BentoMLService
dataset = Dataset(name="digits_dataset", test_size=0.2, shuffle=True, targets=["target"])
model = Model(name="digits_classifier", init=LogisticRegression, dataset=dataset)
service = BentoMLService(model, framework="sklearn")
@dataset.reader
def reader() -> pd.DataFrame:
return load_digits(as_frame=True).frame
@model.trainer
def trainer(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> LogisticRegression:
return estimator.fit(features, target.squeeze())
@model.predictor
def predictor(estimator: LogisticRegression, features: pd.DataFrame) -> List[float]:
return [float(x) for x in estimator.predict(features)]
@model.evaluator
def evaluator(estimator: LogisticRegression, features: pd.DataFrame, target: pd.DataFrame) -> float:
return float(accuracy_score(target.squeeze(), predictor(estimator, features)))
We can then train a model locally and save it to the local BentoML model store:
if __name__ == "__main__":
model_object, metrics = model.train(hyperparameters={"C": 1.0, "max_iter": 10000})
predictions = model.predict(features=load_digits(as_frame=True).frame.sample(5, random_state=42))
print(model_object, metrics, predictions, sep="\n")
saved_model = service.save_model(model.artifact.model_object)
print(f"BentoML saved model: {saved_model}")
If we run python digits_classifier_app.py
, you should see output like this:
LogisticRegression(max_iter=10000.0)
{'train': 1.0, 'test': 0.9722222222222222}
[6.0, 9.0, 3.0, 7.0, 2.0]
BentoML saved model: Model(tag="digits_classifier:degqqptj2g6jxlg6")
We’ve successfully saved our unionml-trained model_object
to the BentoML model store under the tag
digits_classifier:degqqptj2g6jxlg6
, where digits_classifier
is the model name and degqqptj2g6jxlg6
is the
version automatically created for us by BentoML.
Note
You can learn more about BentoML models and the model store here
Defining a Model Service File#
As a framework for creating and deploying ML-powered prediction services, BentoML enforces a clear boundary between model training and serving.
UnionML adheres to this boundary by separating the UnionML app script and a BentoML service definition script. This is so that we can flexibly iterate on model training and tuning, which is separate from serving the best model that we trained.
In a separate file, we define which model we want to serve:
service.py
from digits_classifier_app import service
service.load_model("latest")
service.configure(
enable_async=False,
supported_resources=("cpu",),
supports_cpu_multi_threading=False,
runnable_method_kwargs={"batchable": False},
)
Note that you can replace "latest"
with an explicit model version, e.g. "degqqptj2g6jxlg6"
, which may be a
desired practice if we want to deploy this service to production.
Note
Under the hood, the configure()
method does the following:
Creates a
bentoml.Service
with a custombentoml.Runnable
class that re-uses UnionML-defined components so that you can seamlessly create an API based on theunionml.dataset.Dataset.feature_loader
,unionml.dataset.Dataset.feature_transformer
, andunionml.model.Model.predictor
implementations.Infers the feature and output API IO Descriptors, based on the above UnionML-defined components. These can be explicitly provided as a keyword-argument to the
configure
method in case the feature and prediction output types are not recognized in theIO_DESCRIPTOR_MAPPING
.Defines a
svc
property that can be used to access the underlyingbentoml.Service
.
Serving Locally#
Start the server locally with:
bentoml serve service.py:service.svc
The UnionML basic-bentoml
project template also comes with a request.py
file that lets you test the local endpoint:
import requests
from sklearn.datasets import load_digits
df = load_digits(as_frame=True).frame.drop(["target"], axis="columns")
r = requests.post(
"http://0.0.0.0:3000/predict",
headers={"content-type": "application/json"},
data=df.sample(5, random_state=42).to_json(orient="records"),
)
print(r.text)
Running it should hit the endpoint with a json payload that adheres to the BentoML Service API that we just defined:
python request.py
Expected output:
[6.0,9.0,3.0,7.0,2.0]
Note
You can learn more about the bentoml serve
command here
Building a Bento#
A Bento is a standardized file archive containing all the source code, models, data, and
additional artifacts that BentoML needs to deploy the model to some target infrastructure. To build a Bento, first we
need to define a bentofile.yaml
:
service: "service:service.svc"
labels:
owner: bentoml-integration
stage: dev
include:
- "*.py" # A pattern for matching which files to include in the bento
python:
requirements_txt: requirements.txt
Note
The bentofile.yaml
file can be configured with additional options, which you can learn more about
here.
Then we simply invoke the bentoml build cli command:
bentoml build
Expected Output
Building BentoML service "digits_classifier:tdtkiddj22lszlg6" from build context "...".
Packing model "digits_classifier:degqqptj2g6jxlg6"
██████╗░███████╗███╗░░██╗████████╗░█████╗░███╗░░░███╗██╗░░░░░
██╔══██╗██╔════╝████╗░██║╚══██╔══╝██╔══██╗████╗░████║██║░░░░░
██████╦╝█████╗░░██╔██╗██║░░░██║░░░██║░░██║██╔████╔██║██║░░░░░
██╔══██╗██╔══╝░░██║╚████║░░░██║░░░██║░░██║██║╚██╔╝██║██║░░░░░
██████╦╝███████╗██║░╚███║░░░██║░░░╚█████╔╝██║░╚═╝░██║███████╗
╚═════╝░╚══════╝╚═╝░░╚══╝░░░╚═╝░░░░╚════╝░╚═╝░░░░░╚═╝╚══════╝
Successfully built Bento(tag="digits_classifier:tdtkiddj22lszlg6").
Congratulations! You’ve now built a Bento, which is uniquely identified with the tag digits_classifier:tdtkiddj22lszlg6
.
You can serve this Bento locally with the bentoml serve
tag:
bentoml serve digits_classifier:tdtkiddj22lszlg6
Deploying a Bento#
BentoML offers three ways to deploy a Bento to production:
🐳 Containerize your Bento for custom docker deployment.
🦄 Yatai: A Kubernetes-native model deployment platform.
🚀
bentoctl
: a command-line tool for deploying Bentos on any cloud platform.
To learn more about these deployment options, refer to the BentoML deployment guide.
In the next section, we’ll quickly go through an example of deploying the Bento we built earlier to AWS Lambda
using bentoctl
.
First, install bentoctl
:
pip install bentoctl
Then initialize a bentoctl project:
bentoctl init
Expected output:
...
deployment config generated to: deployment_config.yaml
✨ generated template files.
- bentoctl.tfvars
- main.tf
This will start an interactive prompt where you fill in some metadata about the project, resulting in a
./deployment_config.yaml
file.
Next, we build the deployable artifacts with:
bentoctl build -b digits_classifier:tdtkiddj22lszlg6 -f ./deployment_config.yaml
Where the -b
option must be a Bento tag, for example the digits_classifier:tdtkiddj22lszlg6
tag that we say
earlier in this guide.
Then, we use the terraform
CLI to apply the generated deployment configs to AWS.
terraform init
terraform apply -var-file=bentoctl.tfvars --auto-approve
bentoctl apply
Expected output:
...
endpoint = "<ENDPOINT_URL>"
function_name = "<FUNCTION_NAME>"
image_tag = "<IMAGE_TAG>"
The CLI command should output endpoint
, function_name
, and image_tage
metadata.
Test your AWS lambda endpoint with:
URL=$(terraform output -json | jq -r .endpoint.value)predict
curl -i --header "Content-Type: application/json" --request POST --data "$(cat data/sample_features.json)" $URL
This should produce a json-encoded string of our model’s prediction based on the features in
data/sample_features.json
.
Finally, you can delete all the cloud resources with
bentoctl destroy
Serving a Model Trained on Flyte#
Instead of serving a model trained locally, you can serve a model trained on a Flyte cluster by
using the programmatic API. The recommendation here is to separate the UnionML app definition and invocations of the
remote_train()
to train it on a Flyte cluster.
remote_training.py
from unionml.model import ModelArtifact
from digits_classifier_app import model, service
# train the model on a Flyte cluster
model_artifact: ModelArtifact = model.remote_train(
hyperparameters={"C": 1.0, "max_iter": 5000}
)
# save the model object to the local bentoml store
service.save_model(model_artifact.model_object)
Run the script:
python remote_training.py
Expected output:
...
BentoML saved model: Model(tag="digits_classifier:xyz")
Finally, update the service.py
script with the corresponding model version:
# service.py
...
service.load_model("xyz")
...
Next#
BentoML is a feature-rich model deployment framework, and you can learn more in the official documentation: