Reacting to S3 Events#
In Serving with AWS Lambda we learned how to deploy a prediction service to an AWS Lambda function, which we can call as a web endpoint to generate predictions.
But what if you want to generate predictions based on events, like when we upload a file containing features to an S3 bucket? In this guide, you’ll learn how to build an s3-event-based prediction service.
Prerequisites
To follow this guide, we’ll need the following tools:
An AWS account for Docker registry authentication.
SAM CLI: Install the SAM CLI.
Docker: Install Docker community edition.
We need to use Amazon ECR-based images so be sure to authenticate to the AWS ECR registry.
Initialize a UnionML App for AWS Lambda#
unionml init s3_event_app --template basic-aws-lambda-s3
cd s3_event_app
This will create a UnionML project directory called s3_event_app
which contains
all of the scripts and configuration needed to build and deploy the app.
As we can see in the app.py
script, the main additions to the UnionML app is the
definition of a lambda_handler
function that takes in an event
and context
argument.
import boto3
s3_client = boto3.client("s3") # create s3 client
def lambda_handler(event, context):
model.load_from_env() # load the model
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
key = unquote_plus(record["s3"]["object"]["key"])
# get features from s3
with tempfile.NamedTemporaryFile("w") as f:
s3_client.download_file(bucket, key, f.name)
logger.info("loading features")
features = model.dataset.get_features(Path(f.name))
# generate prediction
predictions = model.predict(features=features)
logger.info(f"generated predictions {predictions}")
# upload predictions to s3
with tempfile.NamedTemporaryFile("w") as out_file:
json.dump(predictions, out_file)
upload_key = f"predictions/{key.split('/')[-1]}"
out_file.flush()
s3_client.upload_file(out_file.name, bucket, upload_key)
logger.info(f"uploaded predictions to {bucket}/{upload_key}")
As you can see, the lambda_handler
implements the following operations:
Downloads the features file from s3 and loads it into memory using the
unionml.dataset.Dataset.get_features()
method.Generates the prediction using the
unionml.model.Model.predict()
method.Uploads the predictions to a
predictions/
prefix in the same s3 bucket. Note that the upload key doesn’t have to be the same as the event object key.
Building the App#
First we need to create the model object that we want to deploy. we can do this by
simply invoking the app.py
script:
python app.py
This will create a joblib-serialized sklearn model called model_object.joblib
in our current directory.
Then, build our application with the sam build
command.
sam build
The SAM CLI builds a Docker image from the Dockerfile
and then installs dependencies defined
in requirements.txt
inside the docker image. The processed template file is saved in the
.aws-sam/build
folder.
Note
SAM CLI reads the application template in template.yaml
to determine the s3 resources
needed for the app, the s3 event trigger definition, and the functions that they invoke.
The Resources
property defines the S3 bucket we’re going to be reacting to:
Parameters:
BucketName:
Type: String
Default: unionml-example-aws-lambda-s3
Resources:
UnionmlAppBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub "${BucketName}"
The Events
property on each function’s definition specifies that the lambda function
should be invoked whenever a .json
file is uploaded to the features/
prefix
of the bucket:
Events:
fileupload:
Type: S3
Properties:
Bucket: !Ref UnionmlAppBucket
Events: s3:ObjectCreated:*
Filter:
S3Key:
Rules:
- Name: prefix
Value: features/
- Name: suffix
Value: .json
And finally, the Policies
key gives the lambda function read and write access to
the bucket.
Policies:
- S3WritePolicy:
BucketName: !Sub "${BucketName}"
- S3ReadPolicy:
BucketName: !Sub "${BucketName}"
Test Locally#
You can test the lambda function by running the unit tests, which are in the tests
folder
in the app directory.
Use pip to install pytest and run unit tests locally.
pip install pytest
python -m pytest tests
Deploying to AWS Lambda#
Once we’re satisfied with our application’s state, we can then build and deploy it to AWS:
Note
If you don’t have an account on AWS, create one here.
sam deploy --guided
The prompt will require you provide inputs to configure your sam
deployment, which
you can read more about in :ref:aws_lambda_deploy
. In this example, we’ll call the
stack test-s3-event-unionml-app
.
Once the deployment process is complete, you should see something like this:
CloudFormation outputs from deployed stack
-----------------------------------------------------------------------------------------------------------------
Outputs
-----------------------------------------------------------------------------------------------------------------
Key UnionmlAppBucket
Description unionml app s3 bucket
Value arn:aws:s3:::unionml-example-aws-lambda-s3
Key UnionmlFunction
Description unionml Lambda Function ARN
Value arn:aws:lambda:us-east-2:479331373192:function:test-s3-event-unionml-app-UnionmlFunction-
Vxbl7NiL8Jz7
Key UnionmlFunctionIamRole
Description Implicit IAM Role created for unionml function
Value arn:aws:iam::479331373192:role/test-s3-event-unionml-app-UnionmlFunctionRole-1LGMQ4OXWD9ZR
-----------------------------------------------------------------------------------------------------------------
Successfully created/updated stack - test-s3-event-unionml-app in us-east-2
Where unionml-example-aws-lambda-s3
is the created bucket, test-s3-event-unionml-app-UnionmlFunction- Vxbl7NiL8Jz7
is the lambda function that’s invoked whenever a .json
file is uploaded to
the features/
prefix.
Triggering the S3 Event#
The unionml app template ships with some sample features in the data
directory, which
we can use to trigger the lambda function:
aws s3 cp data/sample_features.json s3://unionml-example-aws-lambda-s3/features/sample_features-$(date "+%Y%m%d%H%M%S").json
Then we can look at the logs with:
sam logs -n UnionmlFunction --stack-name test-s3-event-unionml-app --tail
Finally, let’s check whether or not our predictions showed up in the predictions/
prefix:
aws s3 ls s3://unionml-example-aws-lambda-s3/predictions/
You should see a file in the format sample_features-{timestamp}.json
:
2022-09-16 17:52:46 5 sample_features-20220916175057.json
You can download and inspect the contents of the predictions file with:
aws s3 cp s3://unionml-example-aws-lambda-s3/predictions/sample_features-20220916175057.json .
cat sample_features-20220916175057.json
The prediction file should be a json file containing an array of one prediction:
[8.0]
Invoking the Function Locally#
If you want to test the function locally, you can invoke it with sam local invoke
, with
the events file in events/event.json
, which is a sample s3 event that you can use to iterate
on the function locally.
However, first you’ll need to upload a file to the key features/sample_features.json
, which is
the file referenced in the event.json
file.
aws s3 cp data/sample_features.json s3://unionml-example-aws-lambda-s3/features/
sam local invoke UnionmlFunction --event events/event.json
Summary#
Congratulations! 🎉 You just set up an event-based prediction service that invokes a lambda function whenever you upload files to S3.